Chart Extraction

Reducto can extract numerical data from visualizations and output it as structured tables. This page covers how to configure chart extraction and what chart types are supported.

Three Levels of Chart Processing

Reducto offers three ways to process charts, each with different accuracy/cost tradeoffs:

Level	Configuration	What it does
Basic	`summarize_figures: True` (default)	Text descriptions for RAG search
Enhanced	`{"scope": "figure"}`	Better models, structured extraction for simpler charts
Advanced	`{"scope": "figure", "advanced_chart_agent": True}`	Multi-stage pipeline for precise numerical extraction

Basic: Figure Summarization

Enabled by default. Generates natural language descriptions using a lightweight model:

result = client.parse.run(
    input=upload.file_id,
    enhance={"summarize_figures": True}  # Default, no need to specify
)

Output example: "Bar chart showing Q1-Q4 revenue growth, with Q4 reaching approximately $2.5M" Good for making charts searchable in RAG applications. Fast and cheap, but doesn’t extract actual numbers.

Enhanced: Figure Scope

The figure scope uses more powerful models and classifies figures before processing:

result = client.parse.run(
    input=upload.file_id,
    enhance={
        "agentic": [{"scope": "figure"}]
    }
)

The pipeline:

Classifies whether the image is a chart or general figure
If chart: runs structured extraction to pull data as text
If not a chart: generates a detailed description using a more powerful model

Better than basic summarization, but not as precise as the advanced pipeline for complex charts.

Advanced: Chart Agent Pipeline

For precise numerical extraction, enable advanced_chart_agent:

result = client.parse.run(
    input=upload.file_id,
    enhance={
        "agentic": [{"scope": "figure", "advanced_chart_agent": True}]
    }
)

How the Pipeline Works

The chart agent runs multiple parallel tasks, then combines results: Stage 1: Parallel extraction

Component detection: Identifies each data series (lines, bars, areas, scatter points) and their colors/styles
OCR: Detects all text (axis labels, titles, legends, tick values)
Legend detection: Maps colors to series labels
Coordinate extraction: Finds axis boundaries and tick positions

Stage 2: Processing

Masking: Isolates each component by color/style for individual processing
Axis functions: Builds mathematical functions to convert pixel coordinates to actual values (handles linear, logarithmic, and time series axes)
Tick alignment: Maps detected points to axis tick values

Stage 3: Value extraction

Converts pixel coordinates to actual (x, y) values using the axis functions
Falls back to a VLM for components that couldn’t be processed deterministically
Outputs a consolidated markdown table

Output Format

Data is returned as a markdown table with the X-axis as rows and each component as a column:

| Date | Revenue ($M) | Expenses ($M) |
| --- | --- | --- |
| 2020-01 | 125.4 | 98.2 |
| 2020-02 | 142.8 | 105.1 |
| 2020-03 | 168.5 | 112.7 |

For bar charts, values show the range: (bottom, top).

Supported Chart Types

Chart Type	Support Level	Notes
Vertical bar charts	✅ Full	Detects bar heights and x-axis categories
Line charts	✅ Full	Tracks points along each series
Area charts	✅ Full	Extracts top/bottom boundaries
Scatter plots	✅ Partial	Works for sparse plots; very dense plots may fail
Combination charts	✅ Full	Handles mixed bar/line/area in same chart
Time series	✅ Full	Supports YYYY, YYYY-MM, YYYY-MM-DD formats
Logarithmic axes	✅ Full	Correctly interprets log-scale values
Dual Y-axis	✅ Full	Maps components to primary or secondary axis

Not Supported

The advanced pipeline will skip these chart types (falls back to VLM description):

Horizontal bar charts: Axis orientation not supported
Pie charts: No coordinate-based extraction possible
Radar/spider charts: Non-Cartesian coordinate system
Density plots: Continuous distributions don’t map to discrete points
Flow charts/diagrams: Not data visualizations
Multiple charts in one image: Requires a single chart per figure
Charts with data labels: If values are already printed on each point, extraction is skipped (the data is already visible)

Custom Prompts

Guide figure processing with custom instructions:

result = client.parse.run(
    input=upload.file_id,
    enhance={
        "agentic": [
            {"scope": "figure", "prompt": "Focus on the primary trend line, ignore confidence intervals"}
        ]
    }
)

Combining with Other Scopes

For documents with charts and complex tables:

result = client.parse.run(
    input=upload.file_id,
    enhance={
        "agentic": [
            {"scope": "table"},
            {"scope": "figure", "advanced_chart_agent": True}
        ]
    }
)

Limitations

Resolution matters: Higher quality source images produce more accurate extractions
Processing time: The advanced pipeline is significantly slower than basic summarization. For async calls, use priority=True to speed up processing.
Dense charts: Scatter plots with many overlapping points may have reduced accuracy
Same-color styles: Charts where solid and dashed lines share the same color can confuse component detection

Get Started

Core Functions

Workflows and Pipelines

Configurations

Reference

Components

Enterprise Resources

Security and privacy

On-premise Resources

Three Levels of Chart Processing

Basic: Figure Summarization

Enhanced: Figure Scope

Advanced: Chart Agent Pipeline

How the Pipeline Works

Output Format

Supported Chart Types

Not Supported

Custom Prompts

Combining with Other Scopes

Limitations

Get Started

Core Functions

Workflows and Pipelines

Configurations

Reference

Components

Enterprise Resources

Security and privacy

On-premise Resources

​Three Levels of Chart Processing

​Basic: Figure Summarization

​Enhanced: Figure Scope

​Advanced: Chart Agent Pipeline

​How the Pipeline Works

​Output Format

​Supported Chart Types

​Not Supported

​Custom Prompts

​Combining with Other Scopes

​Limitations

Three Levels of Chart Processing

Basic: Figure Summarization

Enhanced: Figure Scope

Advanced: Chart Agent Pipeline

How the Pipeline Works

Output Format

Supported Chart Types

Not Supported

Custom Prompts

Combining with Other Scopes

Limitations