📄

PDF Keyword Search

Search for keywords across multiple PDFs and export results with page references.

Document ⭐⭐ Intermediate ⏱️ 5 minutes

😫 The Pain Point

You have 50 contract PDFs and need to find all mentions of “penalty clause” or “termination fee”. Opening each one and using Ctrl+F takes hours.

🚀 Agentic Solution

A Batch PDF Searcher that scans all documents and reports exact locations.

Key Features:

  • Multi-PDF Search: Scan entire folders at once.
  • Context Extraction: Shows surrounding text for each match.
  • Export Results: CSV or Excel report with file, page, and snippet.

⚔️ Phase 1: Commander (Quick Fix)

For quick searching.

Prompt:

“I have a folder contracts with PDF files. Write a Python script using pdfplumber to:

  1. Search: Find all occurrences of keywords ‘penalty’ and ‘termination’.
  2. Output: For each match, print file name, page number, and surrounding context (50 chars).
  3. Export: Save results to search_results.csv.

Support regex patterns with --regex flag. Handle unreadable PDFs (skip with warning).”

Result: Instant location of all relevant clauses.

🏗️ Phase 2: Architect (Permanent Tool)

For Legal/Compliance Teams.

Engineering Prompt:

**Role:** Python GUI Developer (PyQt6 Specialist)
**Task:** Create "PDF Keyword Scanner" Desktop App

**Objective:** A search engine for local PDF repositories to find and export text matches.

**Tech Stack:**
* Language: Python 3.10+
* GUI Library: PyQt6 (Cross-platform)
* PDF Engine: pdfplumber, PyPDF2
* Packaging: PyInstaller

**Functional Requirements:**
1.  **UI Layout (PyQt6):**
    *   **Search:** Target Folder, Keywords Input (Comma/Newline separated).
    *   **Filters:** "Regex Mode", "Case Sensitive".
    *   **Results:** TreeWidget grouping matches by File -> Page number.
    *   **Action:** "Export CSV Report".

2.  **Core Logic:**
    *   Iterate PDFs in folder.
    *   Extract text with layout awareness (`pdfplumber`).
    *   Match keywords and extract 100-char context window.
    *   **Threading:** Search is IO/CPU heavy; split work into thread pool.

3.  **Deliverables:**
    *   `main.py`: Complete source code.
    *   `requirements.txt`: Dependencies.
    *   **Build Instructions:**
        *   Windows: `pyinstaller --onefile --noconsole main.py`
        *   macOS: `pyinstaller --windowed --noconsole main.py`

🧠 Prompt Decoding

  • pdfplumber vs PyPDF2: pdfplumber is better for text extraction with layout preservation.

🛠️ Instructions

  1. Install: pip install pdfplumber
  2. Copy Prompt → Run.

Related Workflows

Explore other categories

📬

Get Started with Agentic Working

Subscribe to receive updates from AgenticWorking.io

📖 Free eBook Guide 📦 7 Ready-to-use Scripts 🔔 Weekly Tips

No spam, unsubscribe anytime. Join 1,000+ subscribers.