> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Spreadsheet Processing

> Configure how Excel, CSV, and spreadsheet files are processed

Spreadsheets present a unique challenge: a single sheet can contain multiple logical tables, empty regions, header rows, and metadata scattered across cells. The `spreadsheet` config group controls how Reducto identifies table boundaries, handles large tables, and extracts cell-level metadata like colors and formulas.

<CodeGroup>
  ```python Python theme={null}
  result = client.parse.run(
      input=upload.file_id,
      spreadsheet={
          "clustering": "accurate",
          "split_large_tables": {"enabled": True, "size": 50},
          "include": ["cell_colors", "formula"],
          "exclude": ["hidden_sheets"]
      }
  )
  ```

  ```javascript Node.js theme={null}
  const result = await client.parse.run({
    input: upload.file_id,
    spreadsheet: {
      clustering: 'accurate',
      split_large_tables: { enabled: true, size: 50 },
      include: ['cell_colors', 'formula'],
      exclude: ['hidden_sheets']
    }
  });
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/parse \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "reducto://your-file-id",
      "spreadsheet": {
        "clustering": "accurate",
        "split_large_tables": {"enabled": true, "size": 50},
        "include": ["cell_colors", "formula"],
        "exclude": ["hidden_sheets"]
      }
    }'
  ```
</CodeGroup>

## Table Clustering

Many spreadsheets contain multiple tables on the same sheet, separated by empty rows or columns. Clustering detects where one table ends and another begins, so each table becomes its own block in the output.

<CodeGroup>
  ```python Python theme={null}
  spreadsheet={"clustering": "accurate"}
  ```

  ```javascript Node.js theme={null}
  spreadsheet: { clustering: 'accurate' }
  ```

  ```bash cURL theme={null}
  "spreadsheet": {"clustering": "accurate"}
  ```
</CodeGroup>

**`accurate` (default):** Uses an LLM to analyze the sheet structure and identify table boundaries. This handles complex layouts where tables have different column structures, headers in unusual positions, or subtle separations. Costs 5x per cell compared to `fast`.

**`fast`:** Uses a rule-based algorithm to find tables based on empty rows/columns. Works well for simple spreadsheets where tables are clearly separated. Standard per-cell cost.

**`disabled`:** Treats the entire sheet as one table. Use this when you know each sheet contains exactly one table, or when you want raw cell data without any boundary detection.

<Note>
  The Go SDK currently only supports `default` and `disabled` clustering modes. Use Python, Node.js, or cURL for `accurate` and `fast` modes.
</Note>

<CodeGroup>
  ```python Python theme={null}
  # Simple spreadsheet with obvious table boundaries
  spreadsheet={"clustering": "fast"}

  # Each sheet is a single table
  spreadsheet={"clustering": "disabled"}
  ```

  ```javascript Node.js theme={null}
  // Simple spreadsheet with obvious table boundaries
  spreadsheet: { clustering: 'fast' }

  // Each sheet is a single table
  spreadsheet: { clustering: 'disabled' }
  ```

  ```bash cURL theme={null}
  # Simple spreadsheet with obvious table boundaries
  "spreadsheet": {"clustering": "fast"}

  # Each sheet is a single table
  "spreadsheet": {"clustering": "disabled"}
  ```
</CodeGroup>

## Large Table Splitting

Tables with many rows create problems downstream: they can exceed LLM context windows, make chunking difficult, and slow down processing. By default, Reducto splits tables that exceed 50 rows into smaller chunks, each retaining the header row for context.

<CodeGroup>
  ```python Python theme={null}
  spreadsheet={
      "split_large_tables": {
          "enabled": True,
          "size": 50  # Max rows per chunk
      }
  }
  ```

  ```javascript Node.js theme={null}
  spreadsheet: {
    split_large_tables: {
      enabled: true,
      size: 50  // Max rows per chunk
    }
  }
  ```

  ```go Go theme={null}
  AdvancedOptions: reducto.F(shared.AdvancedProcessingOptionsParam{
      LargeTableChunking: reducto.F(shared.AdvancedProcessingOptionsLargeTableChunkingParam{
          Enabled: reducto.F(true),
          Size:    reducto.F(int64(50)),
      }),
  })
  ```

  ```bash cURL theme={null}
  "spreadsheet": {
    "split_large_tables": {
      "enabled": true,
      "size": 50
    }
  }
  ```
</CodeGroup>

Each chunk becomes a separate table block. If your original table has headers in row 1 and 200 data rows, you get 4 blocks: rows 1-50, 1+51-100, 1+101-150, 1+151-200 (header repeated in each).

**When to disable:** If your downstream processing needs all rows together (for example, sorting or aggregation), disable splitting:

<CodeGroup>
  ```python Python theme={null}
  spreadsheet={"split_large_tables": {"enabled": False}}
  ```

  ```javascript Node.js theme={null}
  spreadsheet: { split_large_tables: { enabled: false } }
  ```

  ```go Go theme={null}
  AdvancedOptions: reducto.F(shared.AdvancedProcessingOptionsParam{
      LargeTableChunking: reducto.F(shared.AdvancedProcessingOptionsLargeTableChunkingParam{
          Enabled: reducto.F(false),
      }),
  })
  ```

  ```bash cURL theme={null}
  "spreadsheet": {"split_large_tables": {"enabled": false}}
  ```
</CodeGroup>

**Adjusting chunk size:** For tables where rows are highly interdependent, increase the size. For very wide tables that consume lots of tokens, decrease it:

<CodeGroup>
  ```python Python theme={null}
  # More context per chunk
  spreadsheet={"split_large_tables": {"size": 100}}

  # Smaller chunks for wide tables
  spreadsheet={"split_large_tables": {"size": 25}}
  ```

  ```javascript Node.js theme={null}
  // More context per chunk
  spreadsheet: { split_large_tables: { size: 100 } }

  // Smaller chunks for wide tables
  spreadsheet: { split_large_tables: { size: 25 } }
  ```

  ```go Go theme={null}
  // More context per chunk
  AdvancedOptions: reducto.F(shared.AdvancedProcessingOptionsParam{
      LargeTableChunking: reducto.F(shared.AdvancedProcessingOptionsLargeTableChunkingParam{
          Size: reducto.F(int64(100)),
      }),
  })

  // Smaller chunks for wide tables
  AdvancedOptions: reducto.F(shared.AdvancedProcessingOptionsParam{
      LargeTableChunking: reducto.F(shared.AdvancedProcessingOptionsLargeTableChunkingParam{
          Size: reducto.F(int64(25)),
      }),
  })
  ```

  ```bash cURL theme={null}
  # More context per chunk
  "spreadsheet": {"split_large_tables": {"size": 100}}

  # Smaller chunks for wide tables
  "spreadsheet": {"split_large_tables": {"size": 25}}
  ```
</CodeGroup>

## Including Cell Metadata

By default, Reducto extracts cell values only. You can optionally include colors and formulas.

### Cell Colors

Financial spreadsheets often use color to convey meaning: red for negative values, green for positive, yellow for warnings. Enable `cell_colors` to preserve this information:

<CodeGroup>
  ```python Python theme={null}
  spreadsheet={"include": ["cell_colors"]}
  ```

  ```javascript Node.js theme={null}
  spreadsheet: { include: ['cell_colors'] }
  ```

  ```bash cURL theme={null}
  "spreadsheet": {"include": ["cell_colors"]}
  ```
</CodeGroup>

Colors appear as inline styles in HTML table output:

```html theme={null}
<td style="color: #FF0000; background-color: #FFFF00">-$5,000</td>
```

The `color` property is text color; `background-color` is cell highlight/fill.

### Formulas

Spreadsheets contain computational logic in formulas. Enable `formula` to capture the original formula alongside the computed value:

<CodeGroup>
  ```python Python theme={null}
  spreadsheet={"include": ["formula"]}
  ```

  ```javascript Node.js theme={null}
  spreadsheet: { include: ['formula'] }
  ```

  ```bash cURL theme={null}
  "spreadsheet": {"include": ["formula"]}
  ```
</CodeGroup>

Formulas appear as `data-formula` attributes in HTML output:

```html theme={null}
<td data-formula="=SUM(B2:B10)">$125,000</td>
```

This is useful when you need to understand how values were calculated, audit spreadsheet logic, or recreate the computation elsewhere.

## Excluding Content

Spreadsheets often contain content you don't want to process: hidden sheets with intermediate calculations, hidden rows/columns, embedded images, or styling information.

<CodeGroup>
  ```python Python theme={null}
  spreadsheet={"exclude": ["hidden_sheets", "hidden_rows", "hidden_cols"]}
  ```

  ```javascript Node.js theme={null}
  spreadsheet: { exclude: ['hidden_sheets', 'hidden_rows', 'hidden_cols'] }
  ```

  ```bash cURL theme={null}
  "spreadsheet": {"exclude": ["hidden_sheets", "hidden_rows", "hidden_cols"]}
  ```
</CodeGroup>

**`hidden_sheets`:** Skips sheets marked as hidden in Excel. Many spreadsheets hide calculation sheets or raw data that isn't meant for display.

**`hidden_rows` and `hidden_cols`:** Skips rows and columns that are hidden. Useful when spreadsheets hide detail rows in grouped/outlined sections.

**`styling`:** Excludes all styling information (fonts, borders, colors). Use when you only need values.

**`spreadsheet_images`:** Skips embedded images and charts. These are processed separately by default, but you can skip them if not needed.

By default, hidden content IS processed. Explicitly exclude it if your spreadsheets contain sensitive data in hidden areas or if hidden content is irrelevant to your use case.

## Example: Financial Model

Financial models typically have multiple tables per sheet, use colors for emphasis, and have hidden calculation sheets:

<CodeGroup>
  ```python Python theme={null}
  result = client.parse.run(
      input=financial_model_url,
      spreadsheet={
          "clustering": "accurate",      # Multiple tables with complex layouts
          "include": ["cell_colors"],    # Color indicates meaning
          "exclude": ["hidden_sheets"]   # Skip calculation sheets
      }
  )
  ```

  ```javascript Node.js theme={null}
  const result = await client.parse.run({
    input: financialModelUrl,
    spreadsheet: {
      clustering: 'accurate',      // Multiple tables with complex layouts
      include: ['cell_colors'],    // Color indicates meaning
      exclude: ['hidden_sheets']   // Skip calculation sheets
    }
  });
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/parse \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "https://example.com/financial_model.xlsx",
      "spreadsheet": {
        "clustering": "accurate",
        "include": ["cell_colors"],
        "exclude": ["hidden_sheets"]
      }
    }'
  ```
</CodeGroup>

## Example: Data Export

Large data exports are typically single tables with thousands of rows:

<CodeGroup>
  ```python Python theme={null}
  result = client.parse.run(
      input=data_export_url,
      spreadsheet={
          "clustering": "disabled",                        # Single table per sheet
          "split_large_tables": {"enabled": True, "size": 100}  # Larger chunks for context
      }
  )
  ```

  ```javascript Node.js theme={null}
  const result = await client.parse.run({
    input: dataExportUrl,
    spreadsheet: {
      clustering: 'disabled',                        // Single table per sheet
      split_large_tables: { enabled: true, size: 100 }  // Larger chunks for context
    }
  });
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/parse \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "https://example.com/data_export.csv",
      "spreadsheet": {
        "clustering": "disabled",
        "split_large_tables": {"enabled": true, "size": 100}
      }
    }'
  ```
</CodeGroup>
