> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Chart Extraction

> Extract structured data from charts and graphs

Reducto can extract numerical data from visualizations and output it as structured tables. This page covers how to configure chart extraction and what chart types are supported.

## Three Levels of Chart Processing

Reducto offers three ways to process charts, each with different accuracy/cost tradeoffs:

| Level        | Configuration                                       | What it does                                            |
| ------------ | --------------------------------------------------- | ------------------------------------------------------- |
| **Basic**    | `summarize_figures: True` (default)                 | Text descriptions for RAG search                        |
| **Enhanced** | `{"scope": "figure"}`                               | Better models, structured extraction for simpler charts |
| **Advanced** | `{"scope": "figure", "advanced_chart_agent": True}` | Multi-stage pipeline for precise numerical extraction   |

## Basic: Figure Summarization

Enabled by default. Generates natural language descriptions using a lightweight model:

<CodeGroup>
  ```python Python theme={null}
  result = client.parse.run(
      input=upload.file_id,
      enhance={"summarize_figures": True}  # Default, no need to specify
  )
  ```

  ```javascript Node.js theme={null}
  const result = await client.parse.run({
    input: upload.file_id,
    enhance: { summarize_figures: true }  // Default, no need to specify
  });
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/parse \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "reducto://your-file-id",
      "enhance": {"summarize_figures": true}
    }'
  ```
</CodeGroup>

**Output example:** `"Bar chart showing Q1-Q4 revenue growth, with Q4 reaching approximately $2.5M"`

Good for making charts searchable in RAG applications. Fast and cheap, but doesn't extract actual numbers.

## Enhanced: Figure Scope

The `figure` scope uses more powerful models and classifies figures before processing:

<CodeGroup>
  ```python Python theme={null}
  result = client.parse.run(
      input=upload.file_id,
      enhance={
          "agentic": [{"scope": "figure"}]
      }
  )
  ```

  ```javascript Node.js theme={null}
  const result = await client.parse.run({
    input: upload.file_id,
    enhance: {
      agentic: [{ scope: 'figure' }]
    }
  });
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/parse \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "reducto://your-file-id",
      "enhance": {
        "agentic": [{"scope": "figure"}]
      }
    }'
  ```
</CodeGroup>

The pipeline:

1. Classifies whether the image is a chart or general figure
2. If chart: runs structured extraction to pull data as text
3. If not a chart: generates a detailed description using a more powerful model

Better than basic summarization, but not as precise as the advanced pipeline for complex charts.

## Advanced: Chart Agent Pipeline

For precise numerical extraction, enable `advanced_chart_agent`:

<CodeGroup>
  ```python Python theme={null}
  result = client.parse.run(
      input=upload.file_id,
      enhance={
          "agentic": [{"scope": "figure", "advanced_chart_agent": True}]
      }
  )
  ```

  ```javascript Node.js theme={null}
  const result = await client.parse.run({
    input: upload.file_id,
    enhance: {
      agentic: [{ scope: 'figure', advanced_chart_agent: true }]
    }
  });
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/parse \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "reducto://your-file-id",
      "enhance": {
        "agentic": [{"scope": "figure", "advanced_chart_agent": true}]
      }
    }'
  ```
</CodeGroup>

### How the Pipeline Works

The chart agent runs multiple parallel tasks, then combines results:

**Stage 1: Parallel extraction**

* **Component detection**: Identifies each data series (lines, bars, areas, scatter points) and their colors/styles
* **OCR**: Detects all text (axis labels, titles, legends, tick values)
* **Legend detection**: Maps colors to series labels
* **Coordinate extraction**: Finds axis boundaries and tick positions

**Stage 2: Processing**

* **Masking**: Isolates each component by color/style for individual processing
* **Axis functions**: Builds mathematical functions to convert pixel coordinates to actual values (handles linear, logarithmic, and time series axes)
* **Tick alignment**: Maps detected points to axis tick values

**Stage 3: Value extraction**

* Converts pixel coordinates to actual (x, y) values using the axis functions
* Falls back to a VLM for components that couldn't be processed deterministically
* Outputs a consolidated markdown table

### Output Format

Data is returned as a markdown table with the X-axis as rows and each component as a column:

```markdown theme={null}
| Date | Revenue ($M) | Expenses ($M) |
| --- | --- | --- |
| 2020-01 | 125.4 | 98.2 |
| 2020-02 | 142.8 | 105.1 |
| 2020-03 | 168.5 | 112.7 |
```

For bar charts, values show the range: `(bottom, top)`.

## Supported Chart Types

| Chart Type              | Support Level | Notes                                             |
| ----------------------- | ------------- | ------------------------------------------------- |
| **Vertical bar charts** | ✅ Full        | Detects bar heights and x-axis categories         |
| **Line charts**         | ✅ Full        | Tracks points along each series                   |
| **Area charts**         | ✅ Full        | Extracts top/bottom boundaries                    |
| **Scatter plots**       | ✅ Partial     | Works for sparse plots; very dense plots may fail |
| **Combination charts**  | ✅ Full        | Handles mixed bar/line/area in same chart         |
| **Time series**         | ✅ Full        | Supports YYYY, YYYY-MM, YYYY-MM-DD formats        |
| **Logarithmic axes**    | ✅ Full        | Correctly interprets log-scale values             |
| **Dual Y-axis**         | ✅ Full        | Maps components to primary or secondary axis      |

### Not Supported

The advanced pipeline will skip these chart types (falls back to VLM description):

* **Horizontal bar charts**: Axis orientation not supported
* **Pie charts**: No coordinate-based extraction possible
* **Radar/spider charts**: Non-Cartesian coordinate system
* **Density plots**: Continuous distributions don't map to discrete points
* **Flow charts/diagrams**: Not data visualizations
* **Multiple charts in one image**: Requires a single chart per figure
* **Charts with data labels**: If values are already printed on each point, extraction is skipped (the data is already visible)

## Custom Prompts

Guide figure processing with custom instructions:

<CodeGroup>
  ```python Python theme={null}
  result = client.parse.run(
      input=upload.file_id,
      enhance={
          "agentic": [
              {"scope": "figure", "prompt": "Focus on the primary trend line, ignore confidence intervals"}
          ]
      }
  )
  ```

  ```javascript Node.js theme={null}
  const result = await client.parse.run({
    input: upload.file_id,
    enhance: {
      agentic: [
        { scope: 'figure', prompt: 'Focus on the primary trend line, ignore confidence intervals' }
      ]
    }
  });
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/parse \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "reducto://your-file-id",
      "enhance": {
        "agentic": [
          {"scope": "figure", "prompt": "Focus on the primary trend line, ignore confidence intervals"}
        ]
      }
    }'
  ```
</CodeGroup>

## Combining with Other Scopes

For documents with charts and complex tables:

<CodeGroup>
  ```python Python theme={null}
  result = client.parse.run(
      input=upload.file_id,
      enhance={
          "agentic": [
              {"scope": "table"},
              {"scope": "figure", "advanced_chart_agent": True}
          ]
      }
  )
  ```

  ```javascript Node.js theme={null}
  const result = await client.parse.run({
    input: upload.file_id,
    enhance: {
      agentic: [
        { scope: 'table' },
        { scope: 'figure', advanced_chart_agent: true }
      ]
    }
  });
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/parse \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "reducto://your-file-id",
      "enhance": {
        "agentic": [
          {"scope": "table"},
          {"scope": "figure", "advanced_chart_agent": true}
        ]
      }
    }'
  ```
</CodeGroup>

## Limitations

* **Resolution matters**: Higher quality source images produce more accurate extractions
* **Processing time**: The advanced pipeline is significantly slower than basic summarization. For async calls, use `priority=True` to speed up processing.
* **Dense charts**: Scatter plots with many overlapping points may have reduced accuracy
* **Same-color styles**: Charts where solid and dashed lines share the same color can confuse component detection
