😫 The Pain Point
You have 50 contract PDFs and need to find all mentions of “penalty clause” or “termination fee”. Opening each one and using Ctrl+F takes hours.
🚀 Agentic Solution
A Batch PDF Searcher that scans all documents and reports exact locations.
Key Features:
- Multi-PDF Search: Scan entire folders at once.
- Context Extraction: Shows surrounding text for each match.
- Export Results: CSV or Excel report with file, page, and snippet.
⚔️ Phase 1: Commander (Quick Fix)
For quick searching.
Prompt:
“I have a folder
contractswith PDF files. Write a Python script using pdfplumber to:
- Search: Find all occurrences of keywords ‘penalty’ and ‘termination’.
- Output: For each match, print file name, page number, and surrounding context (50 chars).
- Export: Save results to
search_results.csv.Support regex patterns with
--regexflag. Handle unreadable PDFs (skip with warning).”
Result: Instant location of all relevant clauses.
🏗️ Phase 2: Architect (Permanent Tool)
For Legal/Compliance Teams.
Engineering Prompt:
**Role:** Python GUI Developer (PyQt6 Specialist)
**Task:** Create "PDF Keyword Scanner" Desktop App
**Objective:** A search engine for local PDF repositories to find and export text matches.
**Tech Stack:**
* Language: Python 3.10+
* GUI Library: PyQt6 (Cross-platform)
* PDF Engine: pdfplumber, PyPDF2
* Packaging: PyInstaller
**Functional Requirements:**
1. **UI Layout (PyQt6):**
* **Search:** Target Folder, Keywords Input (Comma/Newline separated).
* **Filters:** "Regex Mode", "Case Sensitive".
* **Results:** TreeWidget grouping matches by File -> Page number.
* **Action:** "Export CSV Report".
2. **Core Logic:**
* Iterate PDFs in folder.
* Extract text with layout awareness (`pdfplumber`).
* Match keywords and extract 100-char context window.
* **Threading:** Search is IO/CPU heavy; split work into thread pool.
3. **Deliverables:**
* `main.py`: Complete source code.
* `requirements.txt`: Dependencies.
* **Build Instructions:**
* Windows: `pyinstaller --onefile --noconsole main.py`
* macOS: `pyinstaller --windowed --noconsole main.py`
🧠 Prompt Decoding
- pdfplumber vs PyPDF2: pdfplumber is better for text extraction with layout preservation.
🛠️ Instructions
- Install:
pip install pdfplumber - Copy Prompt → Run.