Overview
Agentic processing uses vision language models to enhance the accuracy of different types of content extraction. Itโs enabled through theenhance.agentic
configuration, which accepts a list of scopes to apply agentic processing to.
Scopes
Agentic processing can be applied to three different scopes:Text ({"scope": "text"}
)
- Use when: You need improved OCR accuracy for complex layouts or difficult-to-read text
- What it does: Adds an extra pass to correct table/text mistakes using AI
- Cost: Small additional cost per page
Table ({"scope": "table"}
)
- Use when: Financial statements, regulatory filings, scientific tables, or anything with multi-row headers, merged cells, or misaligned columns
- What it does: Reconstructs table structure to preserve row/column associations
- Prompting tips: You can specify structural requirements such as aligning rows by company name, or ensuring that all numerical values remain consistently aligned within a single row
- Cost: Additional cost based on table complexity
Figure ({"scope": "figure"}
)
- Use when: You need enhanced figure summaries with better accuracy than the standard summarization
- What it does: Uses advanced vision-language models to provide detailed figure analysis
- Prompting tips: Specify what visual cues should be incorporated. Example: โWhen provided a diagram, extract all of the figure content verbatim.โ
- Cost: Additional cost per figure processed
Enabling Agentic Processing
You can enable one or multiple agentic modes in a single request:Single scope
Multiple scopes
With custom prompting
Migration from Legacy Config
If you were using the oldexperimental_options.enrich
configuration or options.figure_summary.enhanced
, hereโs how to migrate:
Legacy format
2025-10-14 format
Best Practices
- Start with a single scope to understand the impact on your specific documents
- Use custom prompts to guide the model toward your specific needs
- Consider the cost-accuracy tradeoff for your use case
- For tables, be specific about structural requirements in your prompt
- For figures, describe what visual elements are most important to capture