> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Split

> Divide documents into logical sections for targeted processing

Split identifies which pages contain which sections of a document. You describe sections in natural language, and Reducto returns the page numbers where each section lives. Use Split to route document segments into different processing pipelines or to target extraction at specific sections.

Under the hood, Split runs [Parse](/parse/overview) to understand the document, then uses an LLM to classify pages against your descriptions.

<Info>
  **Split is not chunking.** Split returns page numbers telling you where sections live. Chunking (configured via Parse) breaks content into smaller pieces for embeddings or retrieval. They solve different problems: Split identifies locations, chunking divides content.
</Info>

![Split endpoint workflow](https://cdn.reducto.ai/documentation_images/SplitGraphic.png)

***

## Quick Start

<CodeGroup>
  ```python Python theme={null}
  from pathlib import Path
  from reducto import Reducto

  client = Reducto()
  upload = client.upload(file=Path("financial_report.pdf"))

  result = client.split.run(
      input=upload.file_id,
      split_description=[
          {
              "name": "Executive Summary",
              "description": "High-level overview and key findings at the beginning of the report"
          },
          {
              "name": "Financial Statements",
              "description": "Balance sheet, income statement, and cash flow tables"
          },
          {
              "name": "Risk Factors",
              "description": "Section discussing business risks and uncertainties"
          }
      ]
  )

  for split in result.result.splits:
      print(f"{split.name}: pages {split.pages}")
  ```

  ```javascript Node.js theme={null}
  import Reducto from 'reductoai';
  import fs from 'fs';

  const client = new Reducto();
  const upload = await client.upload({
    file: fs.createReadStream('financial_report.pdf'),
  });

  const result = await client.split.run({
    input: upload.file_id,
    split_description: [
      {
        name: 'Executive Summary',
        description: 'High-level overview and key findings at the beginning of the report'
      },
      {
        name: 'Financial Statements',
        description: 'Balance sheet, income statement, and cash flow tables'
      },
      {
        name: 'Risk Factors',
        description: 'Section discussing business risks and uncertainties'
      }
    ]
  });

  for (const split of result.result.splits) {
    console.log(`${split.name}: pages ${split.pages}`);
  }
  ```

  ```go Go theme={null}
  package main

  import (
      "context"
      "encoding/json"
      "fmt"
      "io"
      "os"

      reducto "github.com/reductoai/reducto-go-sdk"
      "github.com/reductoai/reducto-go-sdk/option"
      "github.com/reductoai/reducto-go-sdk/shared"
  )

  func main() {
      client := reducto.NewClient(option.WithAPIKey(os.Getenv("REDUCTO_API_KEY")))

      file, _ := os.Open("financial_report.pdf")
      defer file.Close()
      upload, _ := client.Upload(context.Background(), reducto.UploadParams{
          File: reducto.F[io.Reader](file),
      })

      result, _ := client.Split.Run(context.Background(), reducto.SplitRunParams{
          DocumentURL: reducto.F[reducto.SplitRunParamsDocumentURLUnion](
              shared.UnionString(upload.FileID),
          ),
          SplitDescription: reducto.F([]shared.SplitCategoryParam{
              {
                  Name:        reducto.F("Executive Summary"),
                  Description: reducto.F("High-level overview and key findings"),
              },
              {
                  Name:        reducto.F("Financial Statements"),
                  Description: reducto.F("Balance sheet, income statement, and cash flow tables"),
              },
              {
                  Name:        reducto.F("Risk Factors"),
                  Description: reducto.F("Section discussing business risks and uncertainties"),
              },
          }),
      })

      // Access results via SectionMapping
      for name, pages := range result.Result.SectionMapping {
          fmt.Printf("%s: pages %v\n", name, pages)
      }
      
      // Or print full result as JSON
      resultJSON, _ := json.MarshalIndent(result, "", "  ")
      fmt.Println(string(resultJSON))
  }
  ```

  ```bash cURL theme={null}
  # First upload the file
  FILE_ID=$(curl -s -X POST https://platform.reducto.ai/upload \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -F "file=@financial_report.pdf" | jq -r '.file_id')

  # Then split
  curl -X POST https://platform.reducto.ai/split \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "document_url": "'$FILE_ID'",
      "split_description": [
        {
          "name": "Executive Summary",
          "description": "High-level overview and key findings at the beginning of the report"
        },
        {
          "name": "Financial Statements",
          "description": "Balance sheet, income statement, and cash flow tables"
        },
        {
          "name": "Risk Factors",
          "description": "Section discussing business risks and uncertainties"
        }
      ]
    }'
  ```
</CodeGroup>

This request asks Split to find three sections in a financial report. Split returns the page numbers where each section appears, along with a confidence score for each match.

### Sample Response

```json theme={null}
{
  "result": {
    "splits": [
      {"name": "Executive Summary", "pages": [1, 2], "conf": "high", "partitions": null},
      {"name": "Financial Statements", "pages": [15, 16, 17, 18], "conf": "high", "partitions": null},
      {"name": "Risk Factors", "pages": [8, 9, 10, 11, 12], "conf": "high", "partitions": null}
    ],
    "section_mapping": {
      "Executive Summary": [1, 2],
      "Financial Statements": [15, 16, 17, 18],
      "Risk Factors": [8, 9, 10, 11, 12]
    }
  },
  "usage": {"num_pages": 25, "credits": 50.0}
}
```

Pages are 1-indexed, meaning the first page is page 1, not 0.

***

## Two Ways to Split

Split handles two fundamentally different scenarios.

### Scenario 1: Different Sections Need Different Treatment

Your document contains distinct sections that each need their own extraction schema or processing logic. A financial report has an executive summary (extract key metrics), financial tables (extract line items), and risk disclosures (extract risk categories). These are different types of content requiring different approaches.

For this, you define multiple entries in `split_description`, each describing a different section:

<CodeGroup>
  ```python Python theme={null}
  result = client.split.run(
      input=upload.file_id,
      split_description=[
          {
              "name": "Account Summary",
              "description": "Overview section with account balances and totals"
          },
          {
              "name": "Transaction History",
              "description": "Table listing individual transactions with dates and amounts"
          },
          {
              "name": "Disclosures",
              "description": "Legal disclosures and terms at the end of the statement"
          }
      ]
  )

  # Route each section to appropriate processing
  for split in result.result.splits:
      if split.name == "Transaction History":
          transactions = client.extract.run(
              input=f"jobid://{parse_job_id}",
              instructions={"schema": transaction_schema},
              settings={"array_extract": True},
              parsing={"settings": {"page_range": {"start": split.pages[0], "end": split.pages[-1]}}}
          )
  ```

  ```javascript Node.js theme={null}
  const result = await client.split.run({
    input: upload.file_id,
    split_description: [
      {
        name: 'Account Summary',
        description: 'Overview section with account balances and totals'
      },
      {
        name: 'Transaction History',
        description: 'Table listing individual transactions with dates and amounts'
      },
      {
        name: 'Disclosures',
        description: 'Legal disclosures and terms at the end of the statement'
      }
    ]
  });

  // Route each section to appropriate processing
  for (const split of result.result.splits) {
    if (split.name === 'Transaction History') {
      const transactions = await client.extract.run({
        input: `jobid://${parseJobId}`,
        instructions: { schema: transactionSchema },
        settings: { array_extract: true },
        parsing: { settings: { page_range: { start: split.pages[0], end: split.pages.at(-1) } } }
      });
    }
  }
  ```

  ```go Go theme={null}
  result, _ := client.Split.Run(context.Background(), reducto.SplitRunParams{
      DocumentURL: reducto.F[reducto.SplitRunParamsDocumentURLUnion](
          shared.UnionString(upload.FileID),
      ),
      SplitDescription: reducto.F([]shared.SplitCategoryParam{
          {
              Name:        reducto.F("Account Summary"),
              Description: reducto.F("Overview section with account balances and totals"),
          },
          {
              Name:        reducto.F("Transaction History"),
              Description: reducto.F("Table listing individual transactions with dates and amounts"),
          },
          {
              Name:        reducto.F("Disclosures"),
              Description: reducto.F("Legal disclosures and terms at the end of the statement"),
          },
      }),
  })

  // Route each section to appropriate processing
  for name, pages := range result.Result.SectionMapping {
      if name == "Transaction History" {
          fmt.Printf("Processing %s on pages %v\n", name, pages)
          // Extract with appropriate schema for this section
      }
  }
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/split \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "document_url": "reducto://your-file-id",
      "split_description": [
        {"name": "Account Summary", "description": "Overview section with account balances and totals"},
        {"name": "Transaction History", "description": "Table listing individual transactions with dates and amounts"},
        {"name": "Disclosures", "description": "Legal disclosures and terms at the end of the statement"}
      ]
    }'
  ```
</CodeGroup>

### Scenario 2: Repeating Sections with Unknown Count

Your document contains the same type of section repeated multiple times, but you don't know in advance how many. A consolidated financial statement might have holdings for 3 accounts or 30. A medical records packet might contain intake forms for 5 patients or 50.

This is where `partition_key` becomes essential.

Without a partition key, Split returns all pages containing "account holdings" as a single group. You'd then need to figure out where one account ends and the next begins. The partition key tells Split to look for a specific identifier within each section and group the pages by that identifier.

<CodeGroup>
  ```python Python theme={null}
  result = client.split.run(
      input=upload.file_id,
      split_description=[
          {
              "name": "Account Holdings",
              "description": "Investment holdings table for a specific account",
              "partition_key": "account_number"
          }
      ]
  )
  ```

  ```javascript Node.js theme={null}
  const result = await client.split.run({
    input: upload.file_id,
    split_description: [
      {
        name: 'Account Holdings',
        description: 'Investment holdings table for a specific account',
        partition_key: 'account_number'
      }
    ]
  });
  ```

  ```go Go theme={null}
  result, _ := client.Split.Run(context.Background(), reducto.SplitRunParams{
      DocumentURL: reducto.F[reducto.SplitRunParamsDocumentURLUnion](
          shared.UnionString(upload.FileID),
      ),
      SplitDescription: reducto.F([]shared.SplitCategoryParam{
          {
              Name:         reducto.F("Account Holdings"),
              Description:  reducto.F("Investment holdings table for a specific account"),
              PartitionKey: reducto.F("account_number"),
          },
      }),
  })
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/split \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "document_url": "reducto://your-file-id",
      "split_description": [
        {
          "name": "Account Holdings",
          "description": "Investment holdings table for a specific account",
          "partition_key": "account_number"
        }
      ]
    }'
  ```
</CodeGroup>

The response now includes a `partitions` array that breaks down the section by the values Split found in the document:

```json theme={null}
{
  "result": {
    "splits": [
      {
        "name": "Account Holdings",
        "pages": [1, 2, 3, 7, 8, 9, 10, 11],
        "conf": "high",
        "partitions": [
          {"name": "1234-5678", "pages": [1, 2, 3], "conf": "high"},
          {"name": "8765-4321", "pages": [7, 8, 9, 10, 11], "conf": "high"}
        ]
      }
    ],
    "section_mapping": {
      "Account Holdings 1234-5678": [1, 2, 3],
      "Account Holdings 8765-4321": [7, 8, 9, 10, 11]
    }
  }
}
```

The `name` in each partition is the actual value Split extracted from the document. If the document shows "Account #1234-5678" on pages 1-3 and "Account #8765-4321" on pages 7-11, those become your partition names.

<Warning>
  The partition key describes what to look for semantically, not an exact string to match. If you set `partition_key` to "account number" but the document says "Acct #1234", Split will still find it.
</Warning>

***

## Connecting Split to Parse and Extract

Split is rarely used in isolation. The typical workflow is Parse → Split → Extract, where each step builds on the previous one.

You can reuse a Parse result across multiple Split and Extract calls by passing the job ID. Since Parse is often the slowest step, this saves significant time and credits.

<CodeGroup>
  ```python Python theme={null}
  # Step 1: Parse the document once
  parse_result = client.parse.run(input=upload.file_id)
  job_id = parse_result.job_id

  # Step 2: Split using the job ID (no re-parsing)
  split_result = client.split.run(
      input=f"jobid://{job_id}",
      split_description=[
          {"name": "Summary", "description": "Account summary with balances"},
          {"name": "Transactions", "description": "Transaction history table"}
      ]
  )

  # Step 3: Extract from each section with the appropriate schema
  summary_schema = {
      "type": "object",
      "properties": {
          "account_number": {"type": "string"},
          "current_balance": {"type": "number"},
          "available_balance": {"type": "number"}
      }
  }

  transaction_schema = {
      "type": "object",
      "properties": {
          "transactions": {
              "type": "array",
              "items": {
                  "type": "object",
                  "properties": {
                      "date": {"type": "string"},
                      "description": {"type": "string"},
                      "amount": {"type": "number"}
                  }
              }
          }
      }
  }

  for split in split_result.result.splits:
      schema = summary_schema if split.name == "Summary" else transaction_schema
      
      extract_result = client.extract.run(
          input=f"jobid://{job_id}",
          instructions={"schema": schema},
          parsing={"settings": {"page_range": {"start": split.pages[0], "end": split.pages[-1]}}}
      )
      
      print(f"{split.name}: {extract_result.result}")
  ```

  ```javascript Node.js theme={null}
  // Step 1: Parse the document once
  const parseResult = await client.parse.run({ input: upload.file_id });
  const jobId = parseResult.job_id;

  // Step 2: Split using the job ID (no re-parsing)
  const splitResult = await client.split.run({
    input: `jobid://${jobId}`,
    split_description: [
      { name: 'Summary', description: 'Account summary with balances' },
      { name: 'Transactions', description: 'Transaction history table' }
    ]
  });

  // Step 3: Extract from each section with the appropriate schema
  const summarySchema = {
    type: 'object',
    properties: {
      account_number: { type: 'string' },
      current_balance: { type: 'number' },
      available_balance: { type: 'number' }
    }
  };

  const transactionSchema = {
    type: 'object',
    properties: {
      transactions: {
        type: 'array',
        items: {
          type: 'object',
          properties: {
            date: { type: 'string' },
            description: { type: 'string' },
            amount: { type: 'number' }
          }
        }
      }
    }
  };

  for (const split of splitResult.result.splits) {
    const schema = split.name === 'Summary' ? summarySchema : transactionSchema;
    
    const extractResult = await client.extract.run({
      input: `jobid://${jobId}`,
      instructions: { schema },
      parsing: { settings: { page_range: { start: split.pages[0], end: split.pages.at(-1) } } }
    });
    
    console.log(`${split.name}:`, extractResult.result);
  }
  ```

  ```go Go theme={null}
  // Step 1: Parse the document once
  parseResult, _ := client.Parse.Run(context.Background(), reducto.ParseRunParams{
      DocumentURL: reducto.F[reducto.ParseRunParamsDocumentURLUnion](
          shared.UnionString(upload.FileID),
      ),
  })
  jobID := parseResult.JobID

  // Step 2: Split using the job ID (no re-parsing)
  splitResult, _ := client.Split.Run(context.Background(), reducto.SplitRunParams{
      DocumentURL: reducto.F[reducto.SplitRunParamsDocumentURLUnion](
          shared.UnionString("jobid://" + jobID),
      ),
      SplitDescription: reducto.F([]shared.SplitCategoryParam{
          {Name: reducto.F("Summary"), Description: reducto.F("Account summary with balances")},
          {Name: reducto.F("Transactions"), Description: reducto.F("Transaction history table")},
      }),
  })

  // Step 3: Extract from each section using SectionMapping
  for name, pages := range splitResult.Result.SectionMapping {
      fmt.Printf("%s: processing pages %v\n", name, pages)
      // Use pages to set page_range for extraction
  }
  ```

  ```bash cURL theme={null}
  # Step 1: Parse the document once
  PARSE_RESPONSE=$(curl -s -X POST https://platform.reducto.ai/parse \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"document_url": "reducto://your-file-id"}')
  JOB_ID=$(echo $PARSE_RESPONSE | jq -r '.job_id')

  # Step 2: Split using the job ID
  SPLIT_RESPONSE=$(curl -s -X POST https://platform.reducto.ai/split \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "document_url": "jobid://'$JOB_ID'",
      "split_description": [
        {"name": "Summary", "description": "Account summary with balances"},
        {"name": "Transactions", "description": "Transaction history table"}
      ]
    }')

  echo $SPLIT_RESPONSE | jq '.result.splits'
  ```
</CodeGroup>

When you pass `jobid://` as input, the parsing step is skipped entirely. Any `parsing` options you include won't re-parse the document; they only affect how the already-parsed content is filtered (like limiting which pages to consider for extraction).

***

## Request Parameters

### input (required)

The document to process. Accepts:

| Format          | Example                                    | Description                              |
| --------------- | ------------------------------------------ | ---------------------------------------- |
| Upload response | `upload.file_id` or `"reducto://abc123"`   | File uploaded via `/upload`              |
| Public URL      | `"https://example.com/doc.pdf"`            | Publicly accessible document             |
| Presigned URL   | `"https://bucket.s3.../doc.pdf?X-Amz-..."` | Cloud storage with temporary credentials |
| Job ID          | `"jobid://7600c8c5-..."`                   | Reuse a previous Parse result            |

### split\_description (required)

An array defining the sections to find. Each entry has:

| Field           | Required | Description                                                                          |
| --------------- | -------- | ------------------------------------------------------------------------------------ |
| `name`          | Yes      | Identifier for this section in the response                                          |
| `description`   | Yes      | Natural language description of what the section contains                            |
| `partition_key` | No       | Identifier to look for when a section repeats (e.g., "account number", "patient ID") |

Write descriptions that match how the content actually appears in the document. If the section has visual characteristics ("blue header", "signature line at bottom"), mention them.

### split\_rules

A prompt that controls how Split handles page classification. The default is:

```
"Split the document into the applicable sections. Sections may only overlap at their first and last page if at all."
```

This default means a page can only belong to multiple sections if it's at the boundary between them. Page 5 can belong to both "Section A" and "Section B" only if it's the last page of A and the first page of B.

You can customize this behavior for your use case:

<CodeGroup>
  ```python Python theme={null}
  # Allow full overlap when content genuinely spans multiple categories
  result = client.split.run(
      input=upload.file_id,
      split_description=[...],
      split_rules="Pages can belong to multiple sections. A page with both summary information and transaction data should be included in both sections."
  )

  # Force exclusive classification
  result = client.split.run(
      input=upload.file_id,
      split_description=[...],
      split_rules="Each page must belong to exactly one section. Choose the most relevant section for each page."
  )
  ```

  ```javascript Node.js theme={null}
  // Allow full overlap
  const result = await client.split.run({
    input: upload.file_id,
    split_description: [...],
    split_rules: 'Pages can belong to multiple sections. A page with both summary information and transaction data should be included in both sections.'
  });

  // Force exclusive classification
  const result = await client.split.run({
    input: upload.file_id,
    split_description: [...],
    split_rules: 'Each page must belong to exactly one section. Choose the most relevant section for each page.'
  });
  ```

  ```go Go theme={null}
  // Allow full overlap
  result, _ := client.Split.Run(context.Background(), reducto.SplitRunParams{
      DocumentURL: reducto.F[reducto.SplitRunParamsDocumentURLUnion](
          shared.UnionString(upload.FileID),
      ),
      SplitDescription: reducto.F([]shared.SplitCategoryParam{...}),
      SplitRules: reducto.F("Pages can belong to multiple sections."),
  })

  // Force exclusive classification
  result, _ := client.Split.Run(context.Background(), reducto.SplitRunParams{
      DocumentURL: reducto.F[reducto.SplitRunParamsDocumentURLUnion](
          shared.UnionString(upload.FileID),
      ),
      SplitDescription: reducto.F([]shared.SplitCategoryParam{...}),
      SplitRules: reducto.F("Each page must belong to exactly one section."),
  })
  ```

  ```bash cURL theme={null}
  # Allow full overlap
  curl -X POST https://platform.reducto.ai/split \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "document_url": "reducto://your-file-id",
      "split_description": [...],
      "split_rules": "Pages can belong to multiple sections."
    }'
  ```
</CodeGroup>

The `split_rules` string is passed directly to the LLM as instructions, so write it as you would write instructions for a person doing the classification.

### parsing

Configuration for how the document is parsed. These options are inherited from [Parse](/parse/overview) and are ignored if your `input` is a `jobid://` reference (since the document was already parsed).

<CodeGroup>
  ```python Python theme={null}
  result = client.split.run(
      input=upload.file_id,
      split_description=[...],
      parsing={
          "settings": {
              "page_range": {"start": 1, "end": 50}  # Only analyze first 50 pages
          }
      }
  )
  ```

  ```javascript Node.js theme={null}
  const result = await client.split.run({
    input: upload.file_id,
    split_description: [...],
    parsing: {
      settings: {
        page_range: { start: 1, end: 50 }  // Only analyze first 50 pages
      }
    }
  });
  ```

  ```go Go theme={null}
  result, _ := client.Split.Run(context.Background(), reducto.SplitRunParams{
      DocumentURL: reducto.F[reducto.SplitRunParamsDocumentURLUnion](
          shared.UnionString(upload.FileID),
      ),
      SplitDescription: reducto.F([]shared.SplitCategoryParam{...}),
      Options: reducto.F(shared.BaseProcessingOptionsParam{
          PageRange: reducto.F(shared.PageRangeParam{
              Start: reducto.F(int64(1)),
              End:   reducto.F(int64(50)),
          }),
      }),
  })
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/split \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "document_url": "reducto://your-file-id",
      "split_description": [...],
      "options": {
        "page_range": {"start": 1, "end": 50}
      }
    }'
  ```
</CodeGroup>

### settings

| Field          | Values                       | Default      | Description                                        |
| -------------- | ---------------------------- | ------------ | -------------------------------------------------- |
| `table_cutoff` | `"truncate"` or `"preserve"` | `"truncate"` | How to handle table content when classifying pages |

When analyzing tables, Split truncates them by default to improve speed. This works fine for most cases, but if your `partition_key` values appear deep within tables (row 50 of a 200-row table), you need the full content:

<CodeGroup>
  ```python Python theme={null}
  result = client.split.run(
      input=upload.file_id,
      split_description=[
          {
              "name": "Holdings",
              "description": "Investment holdings table",
              "partition_key": "account_number"
          }
      ],
      settings={"table_cutoff": "preserve"}
  )
  ```

  ```javascript Node.js theme={null}
  const result = await client.split.run({
    input: upload.file_id,
    split_description: [
      {
        name: 'Holdings',
        description: 'Investment holdings table',
        partition_key: 'account_number'
      }
    ],
    settings: { table_cutoff: 'preserve' }
  });
  ```

  ```go Go theme={null}
  result, _ := client.Split.Run(context.Background(), reducto.SplitRunParams{
      DocumentURL: reducto.F[reducto.SplitRunParamsDocumentURLUnion](
          shared.UnionString(upload.FileID),
      ),
      SplitDescription: reducto.F([]shared.SplitCategoryParam{
          {
              Name:         reducto.F("Holdings"),
              Description:  reducto.F("Investment holdings table"),
              PartitionKey: reducto.F("account_number"),
          },
      }),
      // Note: table_cutoff setting may be in experimental options
  })
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/split \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "document_url": "reducto://your-file-id",
      "split_description": [
        {
          "name": "Holdings",
          "description": "Investment holdings table",
          "partition_key": "account_number"
        }
      ],
      "settings": {"table_cutoff": "preserve"}
    }'
  ```
</CodeGroup>

The tradeoff is latency. Preserving tables means more content for the LLM to process.

***

## Sample Response

```json theme={null}
{
  "result": {
    "splits": [
      {
        "name": "Section Name",
        "pages": [1, 2, 3],
        "conf": "high",
        "partitions": null
      }
    ],
    "section_mapping": {
      "Section Name": [1, 2, 3]
    }
  },
  "usage": {
    "num_pages": 10,
    "credits": 20.0
  }
}
```

| Field                        | Type          | Description                                                                          |
| ---------------------------- | ------------- | ------------------------------------------------------------------------------------ |
| `result.splits`              | array         | Array of found sections, one per entry in your `split_description`                   |
| `result.splits[].name`       | string        | The name you provided                                                                |
| `result.splits[].pages`      | array         | Page numbers where this section appears (1-indexed)                                  |
| `result.splits[].conf`       | string        | Either `"high"` or `"low"` indicating match confidence                               |
| `result.splits[].partitions` | array \| null | When using `partition_key`, sub-sections with their own names, pages, and confidence |
| `result.section_mapping`     | object        | Legacy format mapping section names to page arrays. Use `splits` for new code.       |
| `usage.num_pages`            | number        | Total pages in the document                                                          |
| `usage.credits`              | number        | Credits consumed (2 per page, plus Parse credits if not using `jobid://`)            |

A section that isn't found still appears in the response with an empty pages array. Always check that `pages` has content before processing:

<CodeGroup>
  ```python Python theme={null}
  for split in result.result.splits:
      if not split.pages:
          print(f"Warning: {split.name} not found in document")
          continue
      # Process the section
  ```

  ```javascript Node.js theme={null}
  for (const split of result.result.splits) {
    if (!split.pages || split.pages.length === 0) {
      console.log(`Warning: ${split.name} not found in document`);
      continue;
    }
    // Process the section
  }
  ```

  ```go Go theme={null}
  for name, pages := range result.Result.SectionMapping {
      if len(pages) == 0 {
          fmt.Printf("Warning: %s not found in document\n", name)
          continue
      }
      fmt.Printf("Processing %s on pages %v\n", name, pages)
  }
  ```
</CodeGroup>

***

## Async Processing

For large documents or batch processing, use the async pattern to avoid timeouts:

<CodeGroup>
  ```python Python theme={null}
  import time

  submission = client.split.run_job(
      input=upload.file_id,
      split_description=[...]
  )

  while True:
      job = client.job.get(submission.job_id)
      if job.status == "Completed":
          break
      if job.status == "Failed":
          raise Exception(f"Split failed: {job.reason}")
      time.sleep(2)

  for split in job.result.splits:
      print(f"{split.name}: {split.pages}")
  ```

  ```javascript Node.js theme={null}
  const submission = await client.split.runJob({
    input: upload.file_id,
    split_description: [...]
  });

  let job;
  while (true) {
    job = await client.job.retrieve(submission.job_id);
    if (job.status === 'Completed') break;
    if (job.status === 'Failed') throw new Error(`Split failed: ${job.reason}`);
    await new Promise(resolve => setTimeout(resolve, 2000));
  }

  for (const split of job.result.splits) {
    console.log(`${split.name}: ${split.pages}`);
  }
  ```

  ```go Go theme={null}
  import "time"

  submission, _ := client.Split.RunJob(context.Background(), reducto.SplitRunJobParams{
      DocumentURL: reducto.F[reducto.SplitRunJobParamsDocumentURLUnion](
          shared.UnionString(upload.FileID),
      ),
      SplitDescription: reducto.F([]shared.SplitCategoryParam{...}),
  })

  for {
      job, _ := client.Job.Get(context.Background(), submission.JobID)
      if job.Status == "Completed" {
          // Access results via SectionMapping
          resultJSON, _ := json.MarshalIndent(job, "", "  ")
          fmt.Println(string(resultJSON))
          break
      }
      if job.Status == "Failed" {
          fmt.Println("Job failed:", job.Reason)
          break
      }
      time.Sleep(2 * time.Second)
  }
  ```

  ```bash cURL theme={null}
  # Submit async job
  JOB_ID=$(curl -s -X POST https://platform.reducto.ai/split_async \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "document_url": "reducto://your-file-id",
      "split_description": [...]
    }' | jq -r '.job_id')

  # Poll for completion
  while true; do
    STATUS=$(curl -s "https://platform.reducto.ai/job/$JOB_ID" \
      -H "Authorization: Bearer $REDUCTO_API_KEY" | jq -r '.status')
    if [ "$STATUS" = "Completed" ]; then break; fi
    if [ "$STATUS" = "Failed" ]; then echo "Job failed"; exit 1; fi
    sleep 2
  done

  # Get results
  curl -s "https://platform.reducto.ai/job/$JOB_ID" \
    -H "Authorization: Bearer $REDUCTO_API_KEY" | jq '.result'
  ```
</CodeGroup>

Documents over 100 pages should use async to avoid HTTP timeouts. The `.run_job()` method accepts the same parameters as `.run()`.

***

## Troubleshooting

<AccordionGroup>
  <Accordion title="I don't know what sections my document has">
    If you're unsure of the document structure, start with broad, generic descriptions:

    <CodeGroup>
      ```python Python theme={null}
      result = client.split.run(
          input=upload.file_id,
          split_description=[
              {"name": "Introduction", "description": "Opening sections, executive summary, or overview"},
              {"name": "Main Content", "description": "Core content, analysis, or detailed information"},
              {"name": "Tables/Data", "description": "Tables, figures, numerical data, or structured information"},
              {"name": "Appendix", "description": "Supporting materials, references, or supplementary content"}
          ]
      )
      ```

      ```javascript Node.js theme={null}
      const result = await client.split.run({
        input: upload.file_id,
        split_description: [
          { name: 'Introduction', description: 'Opening sections, executive summary, or overview' },
          { name: 'Main Content', description: 'Core content, analysis, or detailed information' },
          { name: 'Tables/Data', description: 'Tables, figures, numerical data, or structured information' },
          { name: 'Appendix', description: 'Supporting materials, references, or supplementary content' }
        ]
      });
      ```

      ```go Go theme={null}
      result, _ := client.Split.Run(context.Background(), reducto.SplitRunParams{
          DocumentURL: reducto.F[reducto.SplitRunParamsDocumentURLUnion](
              shared.UnionString(upload.FileID),
          ),
          SplitDescription: reducto.F([]shared.SplitCategoryParam{
              {Name: reducto.F("Introduction"), Description: reducto.F("Opening sections, executive summary, or overview")},
              {Name: reducto.F("Main Content"), Description: reducto.F("Core content, analysis, or detailed information")},
              {Name: reducto.F("Tables/Data"), Description: reducto.F("Tables, figures, numerical data, or structured information")},
              {Name: reducto.F("Appendix"), Description: reducto.F("Supporting materials, references, or supplementary content")},
          }),
      })

      for name, pages := range result.Result.SectionMapping {
          fmt.Printf("%s: pages %v\n", name, pages)
      }
      ```

      ```bash cURL theme={null}
      curl -X POST https://platform.reducto.ai/split \
        -H "Authorization: Bearer $REDUCTO_API_KEY" \
        -H "Content-Type: application/json" \
        -d '{
          "document_url": "reducto://your-file-id",
          "split_description": [
            {"name": "Introduction", "description": "Opening sections, executive summary, or overview"},
            {"name": "Main Content", "description": "Core content, analysis, or detailed information"},
            {"name": "Tables/Data", "description": "Tables, figures, numerical data, or structured information"},
            {"name": "Appendix", "description": "Supporting materials, references, or supplementary content"}
          ]
        }'
      ```
    </CodeGroup>

    Alternatively, Parse the document first and inspect the content to understand its structure, then create targeted split descriptions.
  </Accordion>

  <Accordion title="Section not found (empty pages array)">
    When a section returns with no pages:

    1. **Check your description.** Is it specific enough? "Transaction table" might not match if the document calls it "Activity History". Include terms that appear in the actual document.

    2. **Verify the section exists.** Run Parse first and inspect the content. If the section isn't visible to Parse, Split won't find it either.

    3. **Broaden your description.** Start general ("any table with dates and amounts") and narrow down once you confirm Split can find it.
  </Accordion>

  <Accordion title="Partitions not being detected">
    When `partitions` is null despite setting `partition_key`:

    1. **Check table\_cutoff.** If the partition key appears inside tables, set `settings.table_cutoff` to `"preserve"`. The default truncation might be hiding the values.

    2. **Verify the key exists.** The partition key value must actually appear in the document. If you're looking for "account number" but the document uses "portfolio ID", adjust your partition key.

    3. **Check for consistent structure.** Partition detection works best when repeating sections have similar layouts. Inconsistent formatting can confuse the classifier.
  </Accordion>

  <Accordion title="Wrong pages returned">
    When Split returns pages that don't contain the expected content:

    1. **Make descriptions more specific.** If multiple sections have similar content, add distinguishing details: "the transaction table in the Account Activity section" rather than just "transaction table".

    2. **Check confidence scores.** Low confidence suggests the match was ambiguous. The LLM made its best guess but wasn't certain.

    3. **Adjust split\_rules.** The default overlap rules might be affecting page assignment. If a page legitimately belongs to multiple sections, customize `split_rules` to allow it.
  </Accordion>

  <Accordion title="Request timeout on large documents">
    Split can timeout on documents over 100 pages:

    1. **Use async processing.** Replace `.run()` with `.run_job()` and poll for results.

    2. **Parse first, then split.** If you're not already using `jobid://`, parse the document separately and pass the job ID. This isolates the slow parsing step.

    3. **Limit page range.** If you know the sections you need are in a specific range, set `parsing.settings.page_range` to process only those pages.
  </Accordion>
</AccordionGroup>

***

## Next Steps

<CardGroup cols={2}>
  <Card title="Parse" icon="file-lines" href="/parse/overview">
    Understand what Split is analyzing under the hood.
  </Card>

  <Card title="Extract" icon="table" href="/extract/overview">
    Pull structured data from the sections Split identifies.
  </Card>

  <Card title="Page Ranges" icon="file-lines" href="/configs/parse/page-ranges">
    Control which pages get processed at each step.
  </Card>

  <Card title="Async Processing" icon="clock" href="/workflows/async-overview">
    Handle large documents without timeouts.
  </Card>
</CardGroup>