😫 The Pain Point
You received a Word document with 50 embedded images. You need those images as separate files for your website. Copy-paste from the document is manual and loses quality.
🚀 Agentic Solution
An Image Extractor that pulls all embedded media from documents.
Key Features:
- Multiple Formats: Word (DOCX), PowerPoint (PPTX), PDF.
- Original Quality: Extracts at embedded resolution.
- Batch Processing: Process folder of documents.
⚔️ Phase 1: Commander (Quick Fix)
For quick extraction.
Prompt:
“I have a Word document
report.docxwith embedded images. Write a Python script to:
- Extract: All images from the document.
- Naming: Save as
report_img_001.png,report_img_002.jpg, etc.- Output: Save to
extracted_images/folder.Print count of extracted images. Handle documents without images gracefully.”
Result: All images extracted at original quality.
🏗️ Phase 2: Architect (Permanent Tool)
For Content Managers.
Engineering Prompt:
**Role:** Python GUI Developer (PyQt6 Specialist)
**Task:** Create "Doc Media Extractor" Desktop App
**Objective:** A batch utility to extract full-resolution images from Office documents and PDFs.
**Tech Stack:**
* Language: Python 3.10+
* GUI Library: PyQt6 (Cross-platform)
* Parsers: python-docx, python-pptx, PyMuPDF (fitz)
* Packaging: PyInstaller
**Functional Requirements:**
1. **UI Layout (PyQt6):**
* **Input:** File List or Folder Selection.
* **Filters:** Toggle buttons for DOCX / PPTX / PDF.
* **Output:** Destination Folder.
* **Progress:** Gallery view of extracted images appearing in real-time.
2. **Core Logic:**
* **Office (DOCX/PPTX):** Unzip structure and extract media folder contents.
* **PDF:** Iterate objects and extract raw image streams with `PyMuPDF`.
* **Threading:** Extraction loop runs concurrently.
3. **Deliverables:**
* `main.py`: Complete source code.
* `requirements.txt`: Dependencies.
* **Build Instructions:**
* Windows: `pyinstaller --onefile --noconsole main.py`
* macOS: `pyinstaller --windowed --noconsole main.py`
🧠 Prompt Decoding
- DOCX internals: A DOCX file is a ZIP containing XML and media files.
🛠️ Instructions
- Install:
pip install python-docx python-pptx - Copy Prompt → Run.