Skip to main content
Reducto can extract numerical data from visualizations and output it as structured tables. This page covers how to configure chart extraction and what chart types are supported.

Three Levels of Chart Processing

Reducto offers three ways to process charts, each with different accuracy/cost tradeoffs:
LevelConfigurationWhat it does
Basicsummarize_figures: True (default)Text descriptions for RAG search
Enhanced{"scope": "figure"}Better models, structured extraction for simpler charts
Advanced{"scope": "figure", "advanced_chart_agent": True}Multi-stage pipeline for precise numerical extraction

Basic: Figure Summarization

Enabled by default. Generates natural language descriptions using a lightweight model:
result = client.parse.run(
    input=upload.file_id,
    enhance={"summarize_figures": True}  # Default, no need to specify
)
Output example: "Bar chart showing Q1-Q4 revenue growth, with Q4 reaching approximately $2.5M" Good for making charts searchable in RAG applications. Fast and cheap, but doesn’t extract actual numbers.

Enhanced: Figure Scope

The figure scope uses more powerful models and classifies figures before processing:
result = client.parse.run(
    input=upload.file_id,
    enhance={
        "agentic": [{"scope": "figure"}]
    }
)
The pipeline:
  1. Classifies whether the image is a chart or general figure
  2. If chart: runs structured extraction to pull data as text
  3. If not a chart: generates a detailed description using a more powerful model
Better than basic summarization, but not as precise as the advanced pipeline for complex charts.

Advanced: Chart Agent Pipeline

For precise numerical extraction, enable advanced_chart_agent:
result = client.parse.run(
    input=upload.file_id,
    enhance={
        "agentic": [{"scope": "figure", "advanced_chart_agent": True}]
    }
)

How the Pipeline Works

The chart agent runs multiple parallel tasks, then combines results: Stage 1: Parallel extraction
  • Component detection: Identifies each data series (lines, bars, areas, scatter points) and their colors/styles
  • OCR: Detects all text (axis labels, titles, legends, tick values)
  • Legend detection: Maps colors to series labels
  • Coordinate extraction: Finds axis boundaries and tick positions
Stage 2: Processing
  • Masking: Isolates each component by color/style for individual processing
  • Axis functions: Builds mathematical functions to convert pixel coordinates to actual values (handles linear, logarithmic, and time series axes)
  • Tick alignment: Maps detected points to axis tick values
Stage 3: Value extraction
  • Converts pixel coordinates to actual (x, y) values using the axis functions
  • Falls back to a VLM for components that couldn’t be processed deterministically
  • Outputs a consolidated markdown table

Output Format

Data is returned as a markdown table with the X-axis as rows and each component as a column:
| Date | Revenue ($M) | Expenses ($M) |
| --- | --- | --- |
| 2020-01 | 125.4 | 98.2 |
| 2020-02 | 142.8 | 105.1 |
| 2020-03 | 168.5 | 112.7 |
For bar charts, values show the range: (bottom, top).

Supported Chart Types

Chart TypeSupport LevelNotes
Vertical bar chartsβœ… FullDetects bar heights and x-axis categories
Line chartsβœ… FullTracks points along each series
Area chartsβœ… FullExtracts top/bottom boundaries
Scatter plotsβœ… PartialWorks for sparse plots; very dense plots may fail
Combination chartsβœ… FullHandles mixed bar/line/area in same chart
Time seriesβœ… FullSupports YYYY, YYYY-MM, YYYY-MM-DD formats
Logarithmic axesβœ… FullCorrectly interprets log-scale values
Dual Y-axisβœ… FullMaps components to primary or secondary axis

Not Supported

The advanced pipeline will skip these chart types (falls back to VLM description):
  • Horizontal bar charts: Axis orientation not supported
  • Pie charts: No coordinate-based extraction possible
  • Radar/spider charts: Non-Cartesian coordinate system
  • Density plots: Continuous distributions don’t map to discrete points
  • Flow charts/diagrams: Not data visualizations
  • Multiple charts in one image: Requires a single chart per figure
  • Charts with data labels: If values are already printed on each point, extraction is skipped (the data is already visible)

Custom Prompts

Guide figure processing with custom instructions:
result = client.parse.run(
    input=upload.file_id,
    enhance={
        "agentic": [
            {"scope": "figure", "prompt": "Focus on the primary trend line, ignore confidence intervals"}
        ]
    }
)

Combining with Other Scopes

For documents with charts and complex tables:
result = client.parse.run(
    input=upload.file_id,
    enhance={
        "agentic": [
            {"scope": "table"},
            {"scope": "figure", "advanced_chart_agent": True}
        ]
    }
)

Limitations

  • Resolution matters: Higher quality source images produce more accurate extractions
  • Processing time: The advanced pipeline is significantly slower than basic summarization. For async calls, use priority=True to speed up processing.
  • Dense charts: Scatter plots with many overlapping points may have reduced accuracy
  • Same-color styles: Charts where solid and dashed lines share the same color can confuse component detection