πŸ“„

Remove Blank Pages

Automatically detect and remove blank or near-blank pages from PDF documents.

Document ⭐ Beginner ⏱️ 2 minutes

😫 The Pain Point

A scanned document has blank pages between sections from the scanner’s automatic page detection. You need to remove them but checking 100 pages manually is tedious.

πŸš€ Agentic Solution

A Blank Page Detector that identifies and removes empty pages automatically.

Key Features:

  • Smart Detection: Handles both pure white and near-white (fax artifacts) pages.
  • Threshold Control: Define what percentage of the page must be blank.
  • Preview Mode: Review detected blanks before removing.

βš”οΈ Phase 1: Commander (Quick Fix)

For quick cleaning.

Prompt:

β€œI have a PDF scanned_doc.pdf with blank pages. Write a Python script to:

  1. Detect: Identify pages that are >95% white/blank.
  2. Remove: Create new PDF without blank pages.
  3. Threshold: Adjustable via --threshold 0.98 (default 0.95).
  4. Report: Print removed page numbers.

Use pdf2image to render and analyze each page. Handle encrypted PDFs.”

Result: Clean document without wasted pages.

πŸ—οΈ Phase 2: Architect (Permanent Tool)

For Document Scanners.

Engineering Prompt:

**Role:** Python GUI Developer (PyQt6 Specialist)
**Task:** Create "Blank Page Cleaner" Desktop App

**Objective:** A sanitation utility to auto-remove empty pages from scanned PDF stacks.

**Tech Stack:**
* Language: Python 3.10+
* GUI Library: PyQt6 (Cross-platform)
* Image Engine: pdf2image, numpy (pixel analysis)
* Packaging: PyInstaller

**Functional Requirements:**
1.  **UI Layout (PyQt6):**
    *   **Input:** Scanned PDF.
    *   **Calibration:** "Sensitivity Threshold" Slider (e.g., 98% white).
    *   **Review:** Grid view of "Candidates for Deletion" (Click to Keep/Discard).
    *   **Action:** "Clean & Save".

2.  **Core Logic:**
    *   Render each page to low-res image.
    *   Convert to grayscale + Calculate white pixel ratio.
    *   Reconstruct PDF excluding marked pages.
    *   **Threading:** Analysis is slow; must be threaded.

3.  **Deliverables:**
    *   `main.py`: Complete source code.
    *   `requirements.txt`: Dependencies.
    *   **Build Instructions:**
        *   Windows: `pyinstaller --onefile --noconsole main.py`
        *   macOS: `pyinstaller --windowed --noconsole main.py`

🧠 Prompt Decoding

  • 95% Threshold: Scanned pages often have slight shadows at edges. Pure 100% white may miss real blank pages.

πŸ› οΈ Instructions

  1. Copy Prompt β†’ Run.
  2. Adjust threshold if needed.

Related Workflows

Explore other categories

πŸ“¬

Get Started with Agentic Working

Subscribe to receive updates from AgenticWorking.io

πŸ“– Free eBook Guide πŸ“¦ 7 Ready-to-use Scripts πŸ”” Weekly Tips

No spam, unsubscribe anytime. Join 1,000+ subscribers.