📄

Extract Images from Docs

Extract all embedded images from Word, PowerPoint, or PDF documents.

Document ⭐ Beginner ⏱️ 3 minutes

😫 The Pain Point

You received a Word document with 50 embedded images. You need those images as separate files for your website. Copy-paste from the document is manual and loses quality.

🚀 Agentic Solution

An Image Extractor that pulls all embedded media from documents.

Key Features:

  • Multiple Formats: Word (DOCX), PowerPoint (PPTX), PDF.
  • Original Quality: Extracts at embedded resolution.
  • Batch Processing: Process folder of documents.

⚔️ Phase 1: Commander (Quick Fix)

For quick extraction.

Prompt:

“I have a Word document report.docx with embedded images. Write a Python script to:

  1. Extract: All images from the document.
  2. Naming: Save as report_img_001.png, report_img_002.jpg, etc.
  3. Output: Save to extracted_images/ folder.

Print count of extracted images. Handle documents without images gracefully.”

Result: All images extracted at original quality.

🏗️ Phase 2: Architect (Permanent Tool)

For Content Managers.

Engineering Prompt:

**Role:** Python GUI Developer (PyQt6 Specialist)
**Task:** Create "Doc Media Extractor" Desktop App

**Objective:** A batch utility to extract full-resolution images from Office documents and PDFs.

**Tech Stack:**
* Language: Python 3.10+
* GUI Library: PyQt6 (Cross-platform)
* Parsers: python-docx, python-pptx, PyMuPDF (fitz)
* Packaging: PyInstaller

**Functional Requirements:**
1.  **UI Layout (PyQt6):**
    *   **Input:** File List or Folder Selection.
    *   **Filters:** Toggle buttons for DOCX / PPTX / PDF.
    *   **Output:** Destination Folder.
    *   **Progress:** Gallery view of extracted images appearing in real-time.

2.  **Core Logic:**
    *   **Office (DOCX/PPTX):** Unzip structure and extract media folder contents.
    *   **PDF:** Iterate objects and extract raw image streams with `PyMuPDF`.
    *   **Threading:** Extraction loop runs concurrently.

3.  **Deliverables:**
    *   `main.py`: Complete source code.
    *   `requirements.txt`: Dependencies.
    *   **Build Instructions:**
        *   Windows: `pyinstaller --onefile --noconsole main.py`
        *   macOS: `pyinstaller --windowed --noconsole main.py`

🧠 Prompt Decoding

  • DOCX internals: A DOCX file is a ZIP containing XML and media files.

🛠️ Instructions

  1. Install: pip install python-docx python-pptx
  2. Copy Prompt → Run.

Related Workflows

Explore other categories

📬

Get Started with Agentic Working

Subscribe to receive updates from AgenticWorking.io

📖 Free eBook Guide 📦 7 Ready-to-use Scripts 🔔 Weekly Tips

No spam, unsubscribe anytime. Join 1,000+ subscribers.