Learn about the different formats available for table output in the API
Reducto provides several options for controlling how tables are formatted in the API response. You can specify the table output format using the table_output_format
parameter in the advanced options.
The dynamic format (dynamic
) automatically chooses between markdown and HTML based on table complexity:
This is our overall recommended format for RAG use cases, etc.
The HTML format (html
) returns tables as HTML strings with proper support for:
<th>
tags)rowspan
and colspan
attributes)This is the default format and is recommended for accuracy sensitive use cases as it preserves all table information.
The Markdown format (md
) returns tables in GitHub-flavored markdown format. This is useful when:
The JSON format (json
) returns tables as nested arrays where:
This format is useful for programmatic processing of table data.
The JSON with bounding boxes format (jsonbbox
) extends the JSON format by including positional information for each cell. The coordinates are normalized to [0,1] range where:
x
: Distance from left edge of the pagey
: Distance from top edge of the pagewidth
: Cell width as percentage of page widthheight
: Cell height as percentage of page heightThe CSV format (csv
) returns tables in comma-separated values format. This is useful when:
The AI JSON format (ai_json
) uses a custom LVM to parse the table structure and return the underlying JSON data. This mode performs the best in cases where the underlying table structure is very complex and not strictly tabular or contains many artifacts.
Learn about the different formats available for table output in the API
Reducto provides several options for controlling how tables are formatted in the API response. You can specify the table output format using the table_output_format
parameter in the advanced options.
The dynamic format (dynamic
) automatically chooses between markdown and HTML based on table complexity:
This is our overall recommended format for RAG use cases, etc.
The HTML format (html
) returns tables as HTML strings with proper support for:
<th>
tags)rowspan
and colspan
attributes)This is the default format and is recommended for accuracy sensitive use cases as it preserves all table information.
The Markdown format (md
) returns tables in GitHub-flavored markdown format. This is useful when:
The JSON format (json
) returns tables as nested arrays where:
This format is useful for programmatic processing of table data.
The JSON with bounding boxes format (jsonbbox
) extends the JSON format by including positional information for each cell. The coordinates are normalized to [0,1] range where:
x
: Distance from left edge of the pagey
: Distance from top edge of the pagewidth
: Cell width as percentage of page widthheight
: Cell height as percentage of page heightThe CSV format (csv
) returns tables in comma-separated values format. This is useful when:
The AI JSON format (ai_json
) uses a custom LVM to parse the table structure and return the underlying JSON data. This mode performs the best in cases where the underlying table structure is very complex and not strictly tabular or contains many artifacts.