📄

Extract Images from PDF

Lấy tất cả hình ảnh từ file PDF ra thành các file riêng lẻ.

PDF ⭐ Beginner ⏱️ 3 phút

🎯 Vấn đề cần giải quyết

Có brochure PDF với ảnh đẹp muốn dùng lại? Cần extract logo từ file PDF? Lấy charts/graphs từ báo cáo?

Pain points:

  • Screenshot thì chất lượng thấp
  • Copy-paste mất resolution
  • Không lấy được original image

⚖️ So sánh: Trước và Sau

Tiêu chíScreenshotExtract
QualityMấtOriginal
Resolution72dpiFull
FormatPNG onlyOriginal format

💡 Prompt mẫu

Trích xuất ảnh từ PDF:

INPUT: [file PDF]

EXTRACTION:
- All images: Có
- Minimum size: 100x100px (bỏ qua icons nhỏ)
- Pages: all / specific

OUTPUT:
- Format: Original (hoặc convert sang PNG/JPG)
- Naming: page{n}_img{m}.{ext}
- Folder: extracted_images/
- Create index: list với page reference

🏗️ Phase 2: Architect (Permanent Tool)

For Designers/Content Creators.

Engineering Prompt:

**Role:** Python GUI Developer (PyQt6 Specialist)
**Task:** Create "PDF Asset Miner" Desktop App

**Objective:** A tool to extract raw image resources from PDF documents at original quality.

**Tech Stack:**
* Language: Python 3.10+
* GUI Library: PyQt6 (Cross-platform)
* Extraction: PyMuPDF (fitz)
* Packaging: PyInstaller

**Functional Requirements:**
1.  **UI Layout (PyQt6):**
    *   **Input:** Source PDF.
    *   **Filter:** Min Width/Height sliders (to filter out icons/lines).
    *   **Format:** "Original" vs "Convert to PNG/JPG".
    *   **Gallery:** Thumbnail grid of found images.

2.  **Core Logic:**
    *   Iterate PDF objects to find image streams (XObject).
    *   Extract stats (width, height, color space).
    *   Save raw bytes or convert.
    *   **Threading:** Extraction loop runs in background.

3.  **Deliverables:**
    *   `main.py`: Complete source code.
    *   `requirements.txt`: Dependencies.
    *   **Build Instructions:**
        *   Windows: `pyinstaller --onefile --noconsole main.py`
        *   macOS: `pyinstaller --windowed --noconsole main.py`

🔧 Tips & Best Practices

Image types in PDF

TypeExtractable
Embedded JPEGYes, original
Embedded PNGYes, original
Vector graphicsAs raster or SVG
ChartsAs image

Quality considerations

  • Embedded = original quality
  • PDF may have downsampled
  • Vector → raster needs resolution setting

Use cases

  • Asset recovery: Lấy lại ảnh từ old designs
  • Research: Extract charts for analysis
  • Archiving: Backup images separately

Độ khó: ⭐ Beginner | Thời gian: 3 phút

Related Workflows

Explore other categories

📬

Get Started with Agentic Working

Subscribe to receive updates from AgenticWorking.io

📖 Free eBook Guide 📦 7 Ready-to-use Scripts 🔔 Weekly Tips

No spam, unsubscribe anytime. Join 1,000+ subscribers.