π« The Pain Point
A scanned document has blank pages between sections from the scannerβs automatic page detection. You need to remove them but checking 100 pages manually is tedious.
π Agentic Solution
A Blank Page Detector that identifies and removes empty pages automatically.
Key Features:
- Smart Detection: Handles both pure white and near-white (fax artifacts) pages.
- Threshold Control: Define what percentage of the page must be blank.
- Preview Mode: Review detected blanks before removing.
βοΈ Phase 1: Commander (Quick Fix)
For quick cleaning.
Prompt:
βI have a PDF
scanned_doc.pdfwith blank pages. Write a Python script to:
- Detect: Identify pages that are >95% white/blank.
- Remove: Create new PDF without blank pages.
- Threshold: Adjustable via
--threshold 0.98(default 0.95).- Report: Print removed page numbers.
Use pdf2image to render and analyze each page. Handle encrypted PDFs.β
Result: Clean document without wasted pages.
ποΈ Phase 2: Architect (Permanent Tool)
For Document Scanners.
Engineering Prompt:
**Role:** Python GUI Developer (PyQt6 Specialist)
**Task:** Create "Blank Page Cleaner" Desktop App
**Objective:** A sanitation utility to auto-remove empty pages from scanned PDF stacks.
**Tech Stack:**
* Language: Python 3.10+
* GUI Library: PyQt6 (Cross-platform)
* Image Engine: pdf2image, numpy (pixel analysis)
* Packaging: PyInstaller
**Functional Requirements:**
1. **UI Layout (PyQt6):**
* **Input:** Scanned PDF.
* **Calibration:** "Sensitivity Threshold" Slider (e.g., 98% white).
* **Review:** Grid view of "Candidates for Deletion" (Click to Keep/Discard).
* **Action:** "Clean & Save".
2. **Core Logic:**
* Render each page to low-res image.
* Convert to grayscale + Calculate white pixel ratio.
* Reconstruct PDF excluding marked pages.
* **Threading:** Analysis is slow; must be threaded.
3. **Deliverables:**
* `main.py`: Complete source code.
* `requirements.txt`: Dependencies.
* **Build Instructions:**
* Windows: `pyinstaller --onefile --noconsole main.py`
* macOS: `pyinstaller --windowed --noconsole main.py`
π§ Prompt Decoding
- 95% Threshold: Scanned pages often have slight shadows at edges. Pure 100% white may miss real blank pages.
π οΈ Instructions
- Copy Prompt β Run.
- Adjust threshold if needed.