Three Levels of Chart Processing
Reducto offers three ways to process charts, each with different accuracy/cost tradeoffs:| Level | Configuration | What it does |
|---|---|---|
| Basic | summarize_figures: True (default) | Text descriptions for RAG search |
| Enhanced | {"scope": "figure"} | Better models, structured extraction for simpler charts |
| Advanced | {"scope": "figure", "advanced_chart_agent": True} | Multi-stage pipeline for precise numerical extraction |
Basic: Figure Summarization
Enabled by default. Generates natural language descriptions using a lightweight model:"Bar chart showing Q1-Q4 revenue growth, with Q4 reaching approximately $2.5M"
Good for making charts searchable in RAG applications. Fast and cheap, but doesnβt extract actual numbers.
Enhanced: Figure Scope
Thefigure scope uses more powerful models and classifies figures before processing:
- Classifies whether the image is a chart or general figure
- If chart: runs structured extraction to pull data as text
- If not a chart: generates a detailed description using a more powerful model
Advanced: Chart Agent Pipeline
For precise numerical extraction, enableadvanced_chart_agent:
How the Pipeline Works
The chart agent runs multiple parallel tasks, then combines results: Stage 1: Parallel extraction- Component detection: Identifies each data series (lines, bars, areas, scatter points) and their colors/styles
- OCR: Detects all text (axis labels, titles, legends, tick values)
- Legend detection: Maps colors to series labels
- Coordinate extraction: Finds axis boundaries and tick positions
- Masking: Isolates each component by color/style for individual processing
- Axis functions: Builds mathematical functions to convert pixel coordinates to actual values (handles linear, logarithmic, and time series axes)
- Tick alignment: Maps detected points to axis tick values
- Converts pixel coordinates to actual (x, y) values using the axis functions
- Falls back to a VLM for components that couldnβt be processed deterministically
- Outputs a consolidated markdown table
Output Format
Data is returned as a markdown table with the X-axis as rows and each component as a column:(bottom, top).
Supported Chart Types
| Chart Type | Support Level | Notes |
|---|---|---|
| Vertical bar charts | β Full | Detects bar heights and x-axis categories |
| Line charts | β Full | Tracks points along each series |
| Area charts | β Full | Extracts top/bottom boundaries |
| Scatter plots | β Partial | Works for sparse plots; very dense plots may fail |
| Combination charts | β Full | Handles mixed bar/line/area in same chart |
| Time series | β Full | Supports YYYY, YYYY-MM, YYYY-MM-DD formats |
| Logarithmic axes | β Full | Correctly interprets log-scale values |
| Dual Y-axis | β Full | Maps components to primary or secondary axis |
Not Supported
The advanced pipeline will skip these chart types (falls back to VLM description):- Horizontal bar charts: Axis orientation not supported
- Pie charts: No coordinate-based extraction possible
- Radar/spider charts: Non-Cartesian coordinate system
- Density plots: Continuous distributions donβt map to discrete points
- Flow charts/diagrams: Not data visualizations
- Multiple charts in one image: Requires a single chart per figure
- Charts with data labels: If values are already printed on each point, extraction is skipped (the data is already visible)
Custom Prompts
Guide figure processing with custom instructions:Combining with Other Scopes
For documents with charts and complex tables:Limitations
- Resolution matters: Higher quality source images produce more accurate extractions
- Processing time: The advanced pipeline is significantly slower than basic summarization. For async calls, use
priority=Trueto speed up processing. - Dense charts: Scatter plots with many overlapping points may have reduced accuracy
- Same-color styles: Charts where solid and dashed lines share the same color can confuse component detection