> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Table Output Formats

> Control how tables are formatted in API responses

Reducto extracts tables from documents and can return them in several formats. The format you choose affects how merged cells, headers, and structure are represented.

## Setting Table Format

<CodeGroup>
  ```python Python theme={null}
  result = client.parse.run(
      input=upload.file_id,
      formatting={"table_output_format": "html"}
  )
  ```

  ```javascript Node.js theme={null}
  const result = await client.parse.run({
    input: upload.file_id,
    formatting: { table_output_format: 'html' }
  });
  ```

  ```go Go theme={null}
  result, _ := client.Parse.Run(context.Background(), reducto.ParseRunParams{
      ParseConfig: reducto.ParseConfigParam{
          DocumentURL: reducto.F[reducto.ParseConfigDocumentURLUnionParam](
              shared.UnionString(upload.FileID),
          ),
          AdvancedOptions: reducto.F(shared.AdvancedProcessingOptionsParam{
              TableOutputFormat: reducto.F(shared.AdvancedProcessingOptionsTableOutputFormatHTML),
          }),
      },
  })
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/parse \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "reducto://your-file-id",
      "formatting": {"table_output_format": "html"}
    }'
  ```
</CodeGroup>

## Available Formats

<Tabs>
  <Tab title="dynamic">
    **Default.** Automatically chooses between markdown and HTML based on table complexity.

    * Uses **markdown** for simple tables (30 cells or fewer AND 4 merged cells or fewer)
    * Uses **HTML** for complex tables (more than 30 cells OR more than 4 merged cells)

    <CodeGroup>
      ```python Python theme={null}
      formatting={"table_output_format": "dynamic"}
      ```

      ```javascript Node.js theme={null}
      formatting: { table_output_format: 'dynamic' }
      ```

      ```go Go theme={null}
      AdvancedOptions: reducto.F(shared.AdvancedProcessingOptionsParam{
          TableOutputFormat: reducto.F(shared.AdvancedProcessingOptionsTableOutputFormatDynamic),
      })
      ```

      ```bash cURL theme={null}
      "formatting": {"table_output_format": "dynamic"}
      ```
    </CodeGroup>

    Best for RAG pipelines where you want clean, readable output for simple tables while preserving structure for complex ones.
  </Tab>

  <Tab title="html">
    Full HTML table structure with proper support for merged cells.

    <CodeGroup>
      ```python Python theme={null}
      formatting={"table_output_format": "html"}
      ```

      ```javascript Node.js theme={null}
      formatting: { table_output_format: 'html' }
      ```

      ```bash cURL theme={null}
      "formatting": {"table_output_format": "html"}
      ```
    </CodeGroup>

    ```html theme={null}
    <table>
      <tr><th colspan="2">Q1 Results</th></tr>
      <tr><th>Product</th><th>Revenue</th></tr>
      <tr><td>Widget A</td><td>$10,000</td></tr>
    </table>
    ```

    Merged cells are encoded using `rowspan` and `colspan` attributes. Use for financial statements, regulatory filings, or any tables where cell merging carries meaning.
  </Tab>

  <Tab title="md">
    GitHub-flavored markdown tables.

    <CodeGroup>
      ```python Python theme={null}
      formatting={"table_output_format": "md"}
      ```

      ```javascript Node.js theme={null}
      formatting: { table_output_format: 'md' }
      ```

      ```go Go theme={null}
      AdvancedOptions: reducto.F(shared.AdvancedProcessingOptionsParam{
          TableOutputFormat: reducto.F(shared.AdvancedProcessingOptionsTableOutputFormatMd),
      })
      ```

      ```bash cURL theme={null}
      "formatting": {"table_output_format": "md"}
      ```
    </CodeGroup>

    ```markdown theme={null}
    | Header 1 | Header 2 |
    | - | - |
    | Data 1 | Data 2 |
    ```

    Markdown cannot represent merged cells. If your table has merged cells, they will be flattened. Use for simple tables where human readability matters.
  </Tab>

  <Tab title="json">
    Nested arrays for programmatic access.

    <CodeGroup>
      ```python Python theme={null}
      formatting={"table_output_format": "json"}
      ```

      ```javascript Node.js theme={null}
      formatting: { table_output_format: 'json' }
      ```

      ```go Go theme={null}
      AdvancedOptions: reducto.F(shared.AdvancedProcessingOptionsParam{
          TableOutputFormat: reducto.F(shared.AdvancedProcessingOptionsTableOutputFormatJson),
      })
      ```

      ```bash cURL theme={null}
      "formatting": {"table_output_format": "json"}
      ```
    </CodeGroup>

    ```json theme={null}
    [
      ["Header 1", "Header 2"],
      ["Data 1", "Data 2"]
    ]
    ```

    First row contains headers. All cell values are strings. Use when you need to process table data programmatically.
  </Tab>

  <Tab title="jsonbbox">
    JSON with normalized bounding box coordinates for each cell.

    <CodeGroup>
      ```python Python theme={null}
      formatting={"table_output_format": "jsonbbox"}
      ```

      ```javascript Node.js theme={null}
      formatting: { table_output_format: 'jsonbbox' }
      ```

      ```go Go theme={null}
      AdvancedOptions: reducto.F(shared.AdvancedProcessingOptionsParam{
          TableOutputFormat: reducto.F(shared.AdvancedProcessingOptionsTableOutputFormatJsonbbox),
      })
      ```

      ```bash cURL theme={null}
      "formatting": {"table_output_format": "jsonbbox"}
      ```
    </CodeGroup>

    ```json theme={null}
    [
      [
        {"text": "Header 1", "bbox": {"x": 0.1, "y": 0.2, "width": 0.3, "height": 0.04}},
        {"text": "Header 2", "bbox": {"x": 0.4, "y": 0.2, "width": 0.3, "height": 0.04}}
      ]
    ]
    ```

    Coordinates are normalized to \[0, 1] relative to page dimensions. Use when you need to know where each cell is located on the page.

    <Note>
      Agentic table enhancement is not compatible with `jsonbbox` and will be automatically disabled when this format is used.
    </Note>
  </Tab>

  <Tab title="csv">
    Comma-separated values.

    <CodeGroup>
      ```python Python theme={null}
      formatting={"table_output_format": "csv"}
      ```

      ```javascript Node.js theme={null}
      formatting: { table_output_format: 'csv' }
      ```

      ```bash cURL theme={null}
      "formatting": {"table_output_format": "csv"}
      ```
    </CodeGroup>

    ```csv theme={null}
    Header 1,Header 2
    Data 1,Data 2
    ```

    Minimal output, easy to import into spreadsheet software. Most token-efficient format.
  </Tab>
</Tabs>

## Additional Options

### Merge Tables

When a logical table spans multiple pages, Reducto may detect it as separate tables. Enable `merge_tables` to combine consecutive tables with the same column count:

<CodeGroup>
  ```python Python theme={null}
  formatting={
      "table_output_format": "html",
      "merge_tables": True
  }
  ```

  ```javascript Node.js theme={null}
  formatting: {
    table_output_format: 'html',
    merge_tables: true
  }
  ```

  ```go Go theme={null}
  AdvancedOptions: reducto.F(shared.AdvancedProcessingOptionsParam{
      TableOutputFormat: reducto.F(shared.AdvancedProcessingOptionsTableOutputFormatHTML),
      MergeTables: reducto.F(true),
  })
  ```

  ```bash cURL theme={null}
  "formatting": {
    "table_output_format": "html",
    "merge_tables": true
  }
  ```
</CodeGroup>

The algorithm:

1. Identifies consecutive tables with identical column counts
2. Uses a language model to determine if the second table is a continuation of the first
3. Combines them into a single table, removing duplicate headers

<Warning>
  Tables are merged based on column count and semantic analysis. Tables with the same number of columns but different structures may be incorrectly merged. Review output when using this option on complex documents.
</Warning>

### Add Page Markers

Inserts page boundary indicators into the content:

<CodeGroup>
  ```python Python theme={null}
  formatting={"add_page_markers": True}
  ```

  ```javascript Node.js theme={null}
  formatting: { add_page_markers: true }
  ```

  ```go Go theme={null}
  AdvancedOptions: reducto.F(shared.AdvancedProcessingOptionsParam{
      AddPageMarkers: reducto.F(true),
  })
  ```

  ```bash cURL theme={null}
  "formatting": {"add_page_markers": true}
  ```
</CodeGroup>

Output includes markers like:

```markdown theme={null}
[[START OF PAGE 1]]

# Document Title

Content from page 1...

[[END OF PAGE 1]]
[[START OF PAGE 2]]
```

Useful for page-specific extraction or citation tracking.

### Include Additional Metadata

The `formatting` group also supports extracting comments, highlights, change tracking, hyperlinks, and signatures. See [Additional Document Data](/configs/parse/additional-document-data) for details.

## Choosing the Right Format

**For LLM context (RAG, summarization):**

* Use `dynamic` (default). It balances readability with structure preservation.
* Markdown is easier for LLMs to parse than HTML for simple tables.
* Complex tables benefit from HTML to preserve relationships between cells.

**For programmatic data extraction:**

* Use `json` when you need to iterate over rows and cells.
* Use `jsonbbox` when cell positions matter (highlighting, overlays).
* Use `csv` for direct import into pandas, spreadsheets, or data pipelines.

**For accuracy-critical applications:**

* Use `html`. It's the only format that preserves merged cells.
* Financial statements, regulatory filings, and complex reports need HTML to maintain correct structure.

## Troubleshooting

<AccordionGroup>
  <Accordion title="Merged cells not appearing correctly">
    Use `html` format. Markdown and JSON formats cannot represent merged cells and will flatten them.
  </Accordion>

  <Accordion title="Tables split across pages">
    Enable `merge_tables: True` to combine consecutive tables with the same column structure.
  </Accordion>

  <Accordion title="Output is too verbose">
    Use `csv` for minimal output. If you need structure but want fewer tokens, use `json` instead of `html`.
  </Accordion>

  <Accordion title="Need to know cell positions">
    Use `jsonbbox` format. Each cell includes normalized (x, y, width, height) coordinates.
  </Accordion>
</AccordionGroup>
