Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt

Use this file to discover all available pages before exploring further.

What is Deep Split?

Deep Split is an agentic splitting mode that iteratively refines its output to achieve near-perfect accuracy. Unlike standard split which classifies each page in a single pass, Deep Split runs an agentic loop that verifies and corrects its section assignments against the source document until a quality threshold is met. This is especially useful for complex documents where a single split pass may mislabel pages, miss boundaries between similar sections, or partition repeating sections inconsistently. Deep Split catches these issues by checking its own work and re-classifying until the results are accurate.

When to Use It

Deep Split is designed for splitting tasks where accuracy is critical and the cost of errors is high. Common use cases include:
  • Consolidated financial statements, where holdings, transactions, and summaries repeat across many accounts and must be partitioned correctly.
  • Insurance claim packets, where sections like medical records, billing statements, and adjuster notes are visually similar and easy to confuse.
  • Loan and mortgage files, where dozens of disclosures, appraisals, and supporting documents are interleaved and ordering matters for downstream processing.
  • Multi-patient medical record bundles, where intake forms, lab reports, and discharge summaries repeat per patient and must group cleanly by partition.
  • Long legal binders, where exhibits, contracts, and addenda span hundreds of pages and section boundaries are ambiguous.
If your splitting task is simple (a handful of clearly distinct sections in a short document), standard split is sufficient. Use Deep Split when you need high reliability on complex or lengthy documents.

How to Use It

Enable Deep Split by setting deep_split to true inside the settings object:
result = client.split.run(
    input=upload.file_id,
    split_description=[
        {
            "name": "Account Summary",
            "description": "Overview section with balances and totals at the top of the statement"
        },
        {
            "name": "Holdings",
            "description": "Investment holdings table for a specific account",
            "partition_key": "account_number"
        },
        {
            "name": "Transaction History",
            "description": "Table of individual transactions with dates, descriptions, and amounts"
        }
    ],
    settings={"deep_split": True}
)

Best Practices

Write specific, distinguishing section descriptions

The agentic loop relies on your split_description entries to verify whether each page is in the right section. Vague descriptions give the agent nothing concrete to check, so include content, position, and visual cues that distinguish each section from its neighbors.
split_description=[
    {
        "name": "Transaction History",
        "description": (
            "Table showing individual transactions with dates, descriptions, and amounts. "
            "Appears after the account summary and before the holdings detail. "
            "Does NOT include opening or closing balance rows."
        )
    }
]
Other examples of effective distinguishing criteria:
  • Repeating sections: “Holdings table for a single account. Each block starts with an account number header and ends before the next account header.”
  • Visually similar sections: “Lab report. Contains the laboratory name in the header and a results table with reference ranges. Distinct from imaging reports, which contain narrative findings instead of tables.”
  • Boundary-sensitive sections: “Signature page. Always the last page of the contract block, immediately before any exhibits.”

Use with partition_key for repeating sections

When the same section repeats for different entities (multiple accounts, multiple patients, multiple companies), pair Deep Split with partition_key so the agent verifies both the section assignment and the partition value extracted from the page.

Pair with Parse configuration

Deep Split can only verify what Parse sees. If the underlying parse output is missing data (for example, a table is not detected or a header is misread), Deep Split will not be able to recover the missing signal. Consider enabling agentic mode for tables, or using a higher-fidelity OCR mode when section identifiers live deep inside tables or in low-quality scans.

Split Overview

Endpoint basics and parameters.

Split Configuration

split_description, partition_key, and table_cutoff.

Deep Extract

The same agentic loop pattern for schema-based extraction.