Skip to main content
Spreadsheets present a unique challenge: a single sheet can contain multiple logical tables, empty regions, header rows, and metadata scattered across cells. The spreadsheet config group controls how Reducto identifies table boundaries, handles large tables, and extracts cell-level metadata like colors and formulas.
result = client.parse.run(
    input=upload.file_id,
    spreadsheet={
        "clustering": "accurate",
        "split_large_tables": {"enabled": True, "size": 50},
        "include": ["cell_colors", "formula"],
        "exclude": ["hidden_sheets"]
    }
)

Table Clustering

Many spreadsheets contain multiple tables on the same sheet, separated by empty rows or columns. Clustering detects where one table ends and another begins, so each table becomes its own block in the output.
spreadsheet={"clustering": "accurate"}
accurate (default): Uses an LLM to analyze the sheet structure and identify table boundaries. This handles complex layouts where tables have different column structures, headers in unusual positions, or subtle separations. Costs 5x per cell compared to fast. fast: Uses a rule-based algorithm to find tables based on empty rows/columns. Works well for simple spreadsheets where tables are clearly separated. Standard per-cell cost. disabled: Treats the entire sheet as one table. Use this when you know each sheet contains exactly one table, or when you want raw cell data without any boundary detection.
The Go SDK currently only supports default and disabled clustering modes. Use Python, Node.js, or cURL for accurate and fast modes.
# Simple spreadsheet with obvious table boundaries
spreadsheet={"clustering": "fast"}

# Each sheet is a single table
spreadsheet={"clustering": "disabled"}

Large Table Splitting

Tables with many rows create problems downstream: they can exceed LLM context windows, make chunking difficult, and slow down processing. By default, Reducto splits tables that exceed 50 rows into smaller chunks, each retaining the header row for context.
spreadsheet={
    "split_large_tables": {
        "enabled": True,
        "size": 50  # Max rows per chunk
    }
}
Each chunk becomes a separate table block. If your original table has headers in row 1 and 200 data rows, you get 4 blocks: rows 1-50, 1+51-100, 1+101-150, 1+151-200 (header repeated in each). When to disable: If your downstream processing needs all rows together (for example, sorting or aggregation), disable splitting:
spreadsheet={"split_large_tables": {"enabled": False}}
Adjusting chunk size: For tables where rows are highly interdependent, increase the size. For very wide tables that consume lots of tokens, decrease it:
# More context per chunk
spreadsheet={"split_large_tables": {"size": 100}}

# Smaller chunks for wide tables
spreadsheet={"split_large_tables": {"size": 25}}

Including Cell Metadata

By default, Reducto extracts cell values only. You can optionally include colors and formulas.

Cell Colors

Financial spreadsheets often use color to convey meaning: red for negative values, green for positive, yellow for warnings. Enable cell_colors to preserve this information:
spreadsheet={"include": ["cell_colors"]}
Colors appear as inline styles in HTML table output:
<td style="color: #FF0000; background-color: #FFFF00">-$5,000</td>
The color property is text color; background-color is cell highlight/fill.

Formulas

Spreadsheets contain computational logic in formulas. Enable formula to capture the original formula alongside the computed value:
spreadsheet={"include": ["formula"]}
Formulas appear as data-formula attributes in HTML output:
<td data-formula="=SUM(B2:B10)">$125,000</td>
This is useful when you need to understand how values were calculated, audit spreadsheet logic, or recreate the computation elsewhere.

Excluding Content

Spreadsheets often contain content you don’t want to process: hidden sheets with intermediate calculations, hidden rows/columns, embedded images, or styling information.
spreadsheet={"exclude": ["hidden_sheets", "hidden_rows", "hidden_cols"]}
hidden_sheets: Skips sheets marked as hidden in Excel. Many spreadsheets hide calculation sheets or raw data that isn’t meant for display. hidden_rows and hidden_cols: Skips rows and columns that are hidden. Useful when spreadsheets hide detail rows in grouped/outlined sections. styling: Excludes all styling information (fonts, borders, colors). Use when you only need values. spreadsheet_images: Skips embedded images and charts. These are processed separately by default, but you can skip them if not needed. By default, hidden content IS processed. Explicitly exclude it if your spreadsheets contain sensitive data in hidden areas or if hidden content is irrelevant to your use case.

Example: Financial Model

Financial models typically have multiple tables per sheet, use colors for emphasis, and have hidden calculation sheets:
result = client.parse.run(
    input=financial_model_url,
    spreadsheet={
        "clustering": "accurate",      # Multiple tables with complex layouts
        "include": ["cell_colors"],    # Color indicates meaning
        "exclude": ["hidden_sheets"]   # Skip calculation sheets
    }
)

Example: Data Export

Large data exports are typically single tables with thousands of rows:
result = client.parse.run(
    input=data_export_url,
    spreadsheet={
        "clustering": "disabled",                        # Single table per sheet
        "split_large_tables": {"enabled": True, "size": 100}  # Larger chunks for context
    }
)