spreadsheet config group controls how Reducto identifies table boundaries, handles large tables, and extracts cell-level metadata like colors and formulas.
Table Clustering
Many spreadsheets contain multiple tables on the same sheet, separated by empty rows or columns. Clustering detects where one table ends and another begins, so each table becomes its own block in the output.accurate (default): Uses an LLM to analyze the sheet structure and identify table boundaries. This handles complex layouts where tables have different column structures, headers in unusual positions, or subtle separations. Costs 5x per cell compared to fast.
fast: Uses a rule-based algorithm to find tables based on empty rows/columns. Works well for simple spreadsheets where tables are clearly separated. Standard per-cell cost.
disabled: Treats the entire sheet as one table. Use this when you know each sheet contains exactly one table, or when you want raw cell data without any boundary detection.
The Go SDK currently only supports
default and disabled clustering modes. Use Python, Node.js, or cURL for accurate and fast modes.Large Table Splitting
Tables with many rows create problems downstream: they can exceed LLM context windows, make chunking difficult, and slow down processing. By default, Reducto splits tables that exceed 50 rows into smaller chunks, each retaining the header row for context.Including Cell Metadata
By default, Reducto extracts cell values only. You can optionally include colors and formulas.Cell Colors
Financial spreadsheets often use color to convey meaning: red for negative values, green for positive, yellow for warnings. Enablecell_colors to preserve this information:
color property is text color; background-color is cell highlight/fill.
Formulas
Spreadsheets contain computational logic in formulas. Enableformula to capture the original formula alongside the computed value:
data-formula attributes in HTML output:
Excluding Content
Spreadsheets often contain content you donβt want to process: hidden sheets with intermediate calculations, hidden rows/columns, embedded images, or styling information.hidden_sheets: Skips sheets marked as hidden in Excel. Many spreadsheets hide calculation sheets or raw data that isnβt meant for display.
hidden_rows and hidden_cols: Skips rows and columns that are hidden. Useful when spreadsheets hide detail rows in grouped/outlined sections.
styling: Excludes all styling information (fonts, borders, colors). Use when you only need values.
spreadsheet_images: Skips embedded images and charts. These are processed separately by default, but you can skip them if not needed.
By default, hidden content IS processed. Explicitly exclude it if your spreadsheets contain sensitive data in hidden areas or if hidden content is irrelevant to your use case.