πŸ“„

OCR Vietnamese

Extract text from images or scanned documents with Vietnamese language support.

Document ⭐⭐ Intermediate ⏱️ 5 minutes

😫 The Pain Point

You have scanned contracts or photos of documents in Vietnamese. You need the text searchable and editable. Retyping manually is slow and error-prone.

πŸš€ Agentic Solution

An OCR Tool optimized for Vietnamese text with high accuracy.

Key Features:

  • Vietnamese Language Pack: Trained for VN characters (Δƒ, Γ’, Δ‘, Ζ‘, Ζ°).
  • Image Preprocessing: Enhance contrast for better recognition.
  • Batch Processing: Extract text from multiple images.

βš”οΈ Phase 1: Commander (Quick Fix)

For quick OCR.

Prompt:

β€œI have a folder scans with images of Vietnamese documents. Write a Python script using pytesseract to:

  1. Preprocess: Convert to grayscale, increase contrast.
  2. OCR: Extract text using Vietnamese language pack (vie).
  3. Output: Save text to {filename}.txt for each image.

Print progress. Handle unreadable images (skip with warning).”

Result: Editable text from all scanned documents.

πŸ—οΈ Phase 2: Architect (Permanent Tool)

For Archivists.

Engineering Prompt:

**Role:** Python GUI Developer (PyQt6 Specialist)
**Task:** Create "Vietnamese OCR Pro" Desktop App

**Objective:** A dedicated OCR tool for Vietnamese documents with image preprocessing and bulk export.

**Tech Stack:**
* Language: Python 3.10+
* GUI Library: PyQt6 (Cross-platform)
* OCR Engine: Tesseract 5.0 (via pytesseract)
* Packaging: PyInstaller

**Functional Requirements:**
1.  **UI Layout (PyQt6):**
    *   **Input:** Scan Folder / Drag & Drop Images.
    *   **Adjustments:** Sliders for "Contrast", "Brightness", "Threshold" (Live Preview).
    *   **Settings:** Language Selector (vie/eng), Output Format (TXT/Word).
    *   **Status:** "Processing 3/10..." progress bar.

2.  **Core Logic:**
    *   **Preprocessing:** Convert to Grayscale -> Deskew -> Increase Contrast (using OpenCV/Pillow).
    *   **OCR:** Multithreaded calls to Tesseract engine.
    *   **Export:** Reconstruct paragraphs and save files.
    *   **Threading:** Essential for heavy image processing tasks.

3.  **Deliverables:**
    *   `main.py`: Complete source code.
    *   `requirements.txt`: Dependencies.
    *   **Build Instructions:**
        *   Windows: `pyinstaller --onefile --noconsole main.py`
        *   macOS: `pyinstaller --windowed --noconsole main.py`

🧠 Prompt Decoding

  • Tesseract vie: Must download Vietnamese language data separately.

πŸ› οΈ Instructions

  1. Install Tesseract OCR engine.
  2. Download Vietnamese language pack.
  3. Install: pip install pytesseract pillow
  4. Copy Prompt β†’ Run.

Related Workflows

Explore other categories

πŸ“¬

Get Started with Agentic Working

Subscribe to receive updates from AgenticWorking.io

πŸ“– Free eBook Guide πŸ“¦ 7 Ready-to-use Scripts πŸ”” Weekly Tips

No spam, unsubscribe anytime. Join 1,000+ subscribers.