> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Split Configuration

> Configure how documents are divided into sections

Split identifies sections in a document based on natural language descriptions. This page covers the configuration options that control how sections are identified and returned.

For basic usage and the full workflow, see the [Split endpoint documentation](/split).

## split\_description

The `split_description` array defines what sections to look for. Each entry has three fields:

<CodeGroup>
  ```python Python theme={null}
  split_description=[
      {
          "name": "Account Summary",
          "description": "Overview section with balances and totals at the top of the statement",
          "partition_key": "account_number"  # Optional
      }
  ]
  ```

  ```javascript Node.js theme={null}
  split_description: [
    {
      name: 'Account Summary',
      description: 'Overview section with balances and totals at the top of the statement',
      partition_key: 'account_number'  // Optional
    }
  ]
  ```

  ```bash cURL theme={null}
  "split_description": [
    {
      "name": "Account Summary",
      "description": "Overview section with balances and totals at the top of the statement",
      "partition_key": "account_number"
    }
  ]
  ```
</CodeGroup>

**name**: The identifier returned in results. Use names that make sense for your downstream processing logic.

**description**: Natural language description of the section's content. The LLM uses this to classify pages. Be specific about what makes this section recognizable: content type, position in document, visual characteristics.

**partition\_key**: For sections that repeat with different identifiers (multiple accounts, multiple patients, multiple companies). When set, Split extracts the identifier value from the document and groups pages by that value.

### Writing Effective Descriptions

The description is passed to an LLM that classifies each page. Vague descriptions lead to ambiguous classifications.

```python theme={null}
# Vague - could match many things
{"name": "Tables", "description": "Pages with tables"}

# Specific - clear criteria for classification
{"name": "Transaction History", "description": "Table showing individual transactions with dates, descriptions, and amounts. Usually appears after the account summary section."}
```

Include distinguishing characteristics:

* Content type (tables, narrative text, forms)
* Position (beginning, end, after section X)
* Visual elements (headers, logos, signature lines)
* What it does NOT include (to avoid confusion with similar sections)

***

## partition\_key

Partition key handles a common scenario: the same section type repeating for different entities. A consolidated statement has holdings for multiple accounts. A medical record packet has intake forms for multiple patients.

Without partition key, Split returns all matching pages as one group. You'd then need to figure out where one entity ends and the next begins. Partition key does this automatically.

<CodeGroup>
  ```python Python theme={null}
  split_description=[
      {
          "name": "Holdings",
          "description": "Investment holdings table for a specific account",
          "partition_key": "account number"
      }
  ]
  ```

  ```javascript Node.js theme={null}
  split_description: [
    {
      name: 'Holdings',
      description: 'Investment holdings table for a specific account',
      partition_key: 'account number'
    }
  ]
  ```

  ```bash cURL theme={null}
  "split_description": [
    {
      "name": "Holdings",
      "description": "Investment holdings table for a specific account",
      "partition_key": "account number"
    }
  ]
  ```
</CodeGroup>

The response includes partitions with extracted identifier values:

```json theme={null}
{
  "result": {
    "splits": [
      {
        "name": "Holdings",
        "pages": [1, 2, 3, 7, 8, 9, 10, 11],
        "conf": "high",
        "partitions": [
          {"name": "1234-5678", "pages": [1, 2, 3], "conf": "high"},
          {"name": "8765-4321", "pages": [7, 8, 9, 10, 11], "conf": "high"}
        ]
      }
    ]
  }
}
```

The `name` in each partition is the actual value extracted from the document. If the document shows "Account #1234-5678" on pages 1-3 and "Account #8765-4321" on pages 7-11, those become your partition names.

**The partition key is semantic, not literal.** If you set `partition_key` to "account number" but the document says "Acct #1234" or "Portfolio ID: 5678", Split will still find it. Describe what the identifier represents, not the exact text format.

### When partition\_key values appear in tables

By default, Split truncates table content to speed up processing. If your partition key values appear deep within tables (not in headers or the first few rows), the truncation might hide them.

Set `table_cutoff` to `preserve` to keep full table content:

<CodeGroup>
  ```python Python theme={null}
  result = client.split.run(
      input=upload.file_id,
      split_description=[
          {
              "name": "Holdings",
              "description": "Investment holdings table",
              "partition_key": "account_number"
          }
      ],
      settings={"table_cutoff": "preserve"}
  )
  ```

  ```javascript Node.js theme={null}
  const result = await client.split.run({
    input: upload.file_id,
    split_description: [
      {
        name: 'Holdings',
        description: 'Investment holdings table',
        partition_key: 'account_number'
      }
    ],
    settings: { table_cutoff: 'preserve' }
  });
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/split \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "document_url": "reducto://your-file-id",
      "split_description": [
        {
          "name": "Holdings",
          "description": "Investment holdings table",
          "partition_key": "account_number"
        }
      ],
      "settings": {"table_cutoff": "preserve"}
    }'
  ```
</CodeGroup>

This increases processing time but ensures partition keys aren't missed.

***

## split\_rules

Controls how pages are assigned to sections. The default rule:

```
"Split the document into the applicable sections. Sections may only overlap at their first and last page if at all."
```

This means a page can belong to multiple sections only at boundaries. Page 5 can belong to both "Section A" and "Section B" only if it's the last page of A and the first page of B.

Customize for your use case:

<CodeGroup>
  ```python Python theme={null}
  # Allow full overlap (page can belong to multiple sections anywhere)
  split_rules="Pages can belong to multiple sections. A page with both summary data and transaction data should appear in both sections."

  # Force exclusive classification (each page belongs to exactly one section)
  split_rules="Each page must belong to exactly one section. Assign to the most relevant section."

  # Document-specific logic
  split_rules="The cover page (page 1) should not be assigned to any section. Start section detection from page 2."
  ```

  ```javascript Node.js theme={null}
  // Allow full overlap
  split_rules: 'Pages can belong to multiple sections. A page with both summary data and transaction data should appear in both sections.'

  // Force exclusive classification
  split_rules: 'Each page must belong to exactly one section. Assign to the most relevant section.'

  // Document-specific logic
  split_rules: 'The cover page (page 1) should not be assigned to any section. Start section detection from page 2.'
  ```

  ```bash cURL theme={null}
  "split_rules": "Pages can belong to multiple sections. A page with both summary data and transaction data should appear in both sections."
  ```
</CodeGroup>

The string is passed directly to the LLM as instructions. Write it as you would write instructions for a person doing the classification.

***

## settings

### table\_cutoff

Controls how table content is processed during section detection.

<CodeGroup>
  ```python Python theme={null}
  settings={"table_cutoff": "truncate"}  # Default
  settings={"table_cutoff": "preserve"}
  ```

  ```javascript Node.js theme={null}
  settings: { table_cutoff: 'truncate' }  // Default
  settings: { table_cutoff: 'preserve' }
  ```

  ```bash cURL theme={null}
  "settings": {"table_cutoff": "truncate"}
  "settings": {"table_cutoff": "preserve"}
  ```
</CodeGroup>

**truncate (default)**: Tables are shortened to the first few rows. Faster processing. Works for most cases where section identifiers appear in headers, titles, or surrounding text.

**preserve**: Full table content is retained. Required when partition\_key values or section identifiers appear deep within table rows. Slower but more thorough.

***

## parsing

Split runs Parse internally before classifying sections. The `parsing` parameter accepts all Parse configuration options.

<CodeGroup>
  ```python Python theme={null}
  result = client.split.run(
      input=upload.file_id,
      split_description=[...],
      parsing={
          "settings": {
              "page_range": {"start": 1, "end": 50},
              "ocr_system": "standard"
          },
          "enhance": {
              "agentic": [{"scope": "table"}]
          }
      }
  )
  ```

  ```javascript Node.js theme={null}
  const result = await client.split.run({
    input: upload.file_id,
    split_description: [...],
    parsing: {
      settings: {
        page_range: { start: 1, end: 50 },
        ocr_system: 'standard'
      },
      enhance: {
        agentic: [{ scope: 'table' }]
      }
    }
  });
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/split \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "document_url": "reducto://your-file-id",
      "split_description": [...],
      "parsing": {
        "settings": {
          "page_range": {"start": 1, "end": 50},
          "ocr_system": "standard"
        },
        "enhance": {
          "agentic": [{"scope": "table"}]
        }
      }
    }'
  ```
</CodeGroup>

If you pass `jobid://` as input (reusing a previous Parse result), the `parsing` options are ignored since the document was already parsed.

***

## Response Structure

```json theme={null}
{
  "result": {
    "splits": [
      {
        "name": "Section Name",
        "pages": [1, 2, 3],
        "conf": "high",
        "partitions": null
      }
    ],
    "section_mapping": {
      "Section Name": [1, 2, 3]
    }
  },
  "usage": {
    "num_pages": 10,
    "credits": 20.0
  }
}
```

**splits**: Array of found sections, one per entry in `split_description`.

**splits\[].name**: The name you provided.

**splits\[].pages**: Page numbers where this section appears (1-indexed).

**splits\[].conf**: `"high"` or `"low"` indicating classification confidence.

**splits\[].partitions**: When using `partition_key`, sub-sections grouped by extracted identifier values. Each partition has its own `name` (the extracted value), `pages`, and `conf`.

**section\_mapping**: Legacy format mapping section names to page arrays. Use `splits` for new code.

A section not found in the document still appears in results with an empty pages array. Always check that `pages` has content before processing.
