PDF Keyword Search

😫 The Pain Point

You have 50 contract PDFs and need to find all mentions of “penalty clause” or “termination fee”. Opening each one and using Ctrl+F takes hours.

🚀 Agentic Solution

A Batch PDF Searcher that scans all documents and reports exact locations.

Key Features:

Multi-PDF Search: Scan entire folders at once.
Context Extraction: Shows surrounding text for each match.
Export Results: CSV or Excel report with file, page, and snippet.

⚔️ Phase 1: Commander (Quick Fix)

For quick searching.

Prompt:

“I have a folder contracts with PDF files. Write a Python script using pdfplumber to:

Search: Find all occurrences of keywords ‘penalty’ and ‘termination’.

Output: For each match, print file name, page number, and surrounding context (50 chars).

Export: Save results to search_results.csv.

Support regex patterns with --regex flag. Handle unreadable PDFs (skip with warning).”

Result: Instant location of all relevant clauses.

🏗️ Phase 2: Architect (Permanent Tool)

For Legal/Compliance Teams.

Engineering Prompt:

**Role:** Python GUI Developer (PyQt6 Specialist)
**Task:** Create "PDF Keyword Scanner" Desktop App

**Objective:** A search engine for local PDF repositories to find and export text matches.

**Tech Stack:**
* Language: Python 3.10+
* GUI Library: PyQt6 (Cross-platform)
* PDF Engine: pdfplumber, PyPDF2
* Packaging: PyInstaller

**Functional Requirements:**
1.  **UI Layout (PyQt6):**
    *   **Search:** Target Folder, Keywords Input (Comma/Newline separated).
    *   **Filters:** "Regex Mode", "Case Sensitive".
    *   **Results:** TreeWidget grouping matches by File -> Page number.
    *   **Action:** "Export CSV Report".

2.  **Core Logic:**
    *   Iterate PDFs in folder.
    *   Extract text with layout awareness (`pdfplumber`).
    *   Match keywords and extract 100-char context window.
    *   **Threading:** Search is IO/CPU heavy; split work into thread pool.

3.  **Deliverables:**
    *   `main.py`: Complete source code.
    *   `requirements.txt`: Dependencies.
    *   **Build Instructions:**
        *   Windows: `pyinstaller --onefile --noconsole main.py`
        *   macOS: `pyinstaller --windowed --noconsole main.py`

🧠 Prompt Decoding

pdfplumber vs PyPDF2: pdfplumber is better for text extraction with layout preservation.

🛠️ Instructions

Install: pip install pdfplumber
Copy Prompt → Run.

😫 The Pain Point

🚀 Agentic Solution

Key Features:

⚔️ Phase 1: Commander (Quick Fix)

🏗️ Phase 2: Architect (Permanent Tool)

🧠 Prompt Decoding

🛠️ Instructions

Related Workflows

PDF Merge

PDF Split

PDF to Images

PDF Watermark

Trích xuất ảnh từ PDF

Blur Detector

Get Started with Agentic Working

😫 The Pain Point

🚀 Agentic Solution

Key Features:

⚔️ Phase 1: Commander (Quick Fix)

🏗️ Phase 2: Architect (Permanent Tool)

🧠 Prompt Decoding

🛠️ Instructions

Related Workflows

PDF Merge

PDF Split

PDF to Images

PDF Watermark

Trích xuất ảnh từ PDF

Blur Detector

Get Started with Agentic Working

Get Your Free Starter Kit