π« The Pain Point
Boss hands you a printed paper: βType this into Word for me.β Or you have a PDF that is actually just a picture of text. You canβt copy-paste anything. Retyping 10 pages manually is painful.
π Agentic Solution
Optical Character Recognition (OCR): The computer βreadsβ the pixels and converts them back to text letters.
Key Features:
- Language Support: Can read English, Vietnamese, or any language (if pack is installed).
- Layout: Preserves paragraphs.
βοΈ Phase 1: Commander (Quick Fix)
For a single page conversion.
Prompt:
βUse
pytesseractto read text fromscan.jpg. Save the content tooutput.txt.β
Result: The text content extracted.
ποΈ Phase 2: Architect (Permanent Tool)
For Librarians/Data Entry.
Engineering Prompt:
**Role:** Python AI Developer
**Task:** Create an "OCR Tool".
**Requirements:**
1. **Prerequisite:** User must install **Tesseract-OCR** engine separately. Check for installation.
2. **GUI:**
* Select Source (Image or PDF).
* Language selection (eng/vie).
* "Convert to Word" button.
3. **Logic:**
* If PDF: Convert to images first (`pdf2image`).
* Run `pytesseract.image_to_string(img)`.
* Save text to `.docx`.
4. **Deliverables:** `ocr_tool.py`, `run.bat` (Windows), `run.sh` (Mac).
π§ Prompt Decoding
- Dependency Hell: OCR is tricky because it needs an external engine (Tesseract) installed on the OS. The prompt warns the user about this expectation to prevent βCommand Not Foundβ errors.
π οΈ Instructions
- Install Tesseract-OCR.
- Copy Prompt -> Paste -> Run.
- Select Image -> Convert.