Skip to main content
The split.run() method divides documents into sections based on descriptions you provide. You define what sections to look for, and Split identifies which pages belong to each section.

Basic Usage

from pathlib import Path
from reducto import Reducto

client = Reducto()

# Upload
upload = client.upload(file=Path("document.pdf"))

# Split the document - split_description is required
result = client.split.run(
    input=upload.file_id,
    split_description=[
        {"name": "Summary", "description": "Executive summary or overview section"},
        {"name": "Financial Data", "description": "Tables with financial figures"},
        {"name": "Notes", "description": "Footnotes or additional notes"}
    ]
)

# Access splits
for split in result.result.splits:
    print(f"Section: {split.name}")
    print(f"Pages: {split.pages}")
    print(f"Confidence: {split.conf}")

Method Signature

def split.run(
    input: str,
    split_description: list[dict],
    parsing: dict | None = None,
    settings: dict | None = None,
    split_rules: str | None = None
) -> SplitResponse

Parameters

ParameterTypeRequiredDescription
inputstrYesFile ID (reducto://...), URL, or jobid:// reference
split_descriptionlist[dict]YesList of sections to identify, each with name, description, and optional partition_key
parsingdict | NoneNoParse configuration (page range, OCR settings)
settingsdict | NoneNoSplit settings (e.g., table_cutoff)
split_rulesstr | NoneNoNatural language prompt describing rules for splitting

Split Description

The split_description parameter is required. Each entry defines a section to find:
split_description = [
    {
        "name": "Cover Page",
        "description": "Title page with company logo and report title"
    },
    {
        "name": "Table of Contents", 
        "description": "Page listing all sections with page numbers"
    },
    {
        "name": "Financial Statements",
        "description": "Balance sheet, income statement, and cash flow tables"
    }
]

result = client.split.run(
    input=upload.file_id,
    split_description=split_description
)

With Partition Key

Use partition_key when a section type repeats multiple times and you want to group by a specific identifier:
split_description = [
    {
        "name": "Account Holdings",
        "description": "Investment holdings for a specific account",
        "partition_key": "account_number"  # Group pages by account number
    }
]
The partition_key is a string describing what identifier to look for (e.g., “account number”, “patient ID”, “invoice number”). Split will find all instances of that section and group them by the identifier value it finds in the document.

Split Rules

The split_rules parameter is a natural language prompt that controls how pages are classified. The default rule allows pages to belong to multiple sections only at boundaries:
# Allow pages to belong to multiple sections
result = client.split.run(
    input=upload.file_id,
    split_description=[...],
    split_rules="Pages can belong to multiple sections if they contain content from both."
)

# Force exclusive classification
result = client.split.run(
    input=upload.file_id,
    split_description=[...],
    split_rules="Each page must belong to exactly one section. Choose the most relevant section."
)

Parsing Configuration

Configure how the document is parsed before splitting:
result = client.split.run(
    input=upload.file_id,
    split_description=[...],
    parsing={
        "settings": {
            "page_range": {"start": 1, "end": 20}
        }
    }
)

Response Structure

result: SplitResponse = client.split.run(...)

# Top-level fields
print(result.usage.num_pages) # int: Pages processed
print(result.usage.credits)   # float: Credits used

# Splits
for split in result.result.splits:
    print(split.name)         # str: Section name (from split_description)
    print(split.pages)        # list[int]: Page numbers in this section (1-indexed)
    print(split.conf)         # str: Confidence level ("high" or "low")
    print(split.partitions)   # list | None: Sub-sections when partition_key is used

Split Object

Each split contains:
  • name (str): The section name you defined
  • pages (list[int]): Page numbers belonging to this section (1-indexed)
  • conf (str): Confidence level ("high" or "low")
  • partitions (list | None): When using partition_key, contains sub-sections with their own name, pages, and conf

Error Handling

from reducto import Reducto
import reducto

try:
    result = client.split.run(
        input=upload.file_id,
        split_description=[{"name": "Summary", "description": "..."}]
    )
except reducto.APIConnectionError as e:
    print(f"Connection failed: {e}")
except reducto.APIStatusError as e:
    print(f"Split failed: {e.status_code} - {e.response}")

Complete Example

from pathlib import Path
from reducto import Reducto

client = Reducto()

# Upload
upload = client.upload(file=Path("financial-statement.pdf"))

# Define sections to find
split_description = [
    {
        "name": "Account Summary",
        "description": "Overview of account balances and holdings"
    },
    {
        "name": "Holdings Detail",
        "description": "Detailed list of individual holdings with values"
    },
    {
        "name": "Transaction History",
        "description": "Recent transactions and activity"
    }
]

# Split the document
result = client.split.run(
    input=upload.file_id,
    split_description=split_description
)

# Process results
print(f"Found {len(result.result.splits)} sections")

for split in result.result.splits:
    print(f"\n=== {split.name} ===")
    print(f"Pages: {split.pages}")
    print(f"Confidence: {split.conf}")

Chaining with Extract

A common pattern is to split a document then extract different schemas from each section:
# Split first
split_result = client.split.run(
    input=upload.file_id,
    split_description=[
        {"name": "Summary", "description": "Account summary"},
        {"name": "Holdings", "description": "Holdings table"}
    ]
)

# Extract from each section using page ranges
for split in split_result.result.splits:
    if split.pages:
        extract_result = client.extract.run(
            input=f"jobid://{split_result.job_id}",
            instructions={"schema": get_schema_for(split.name)},
            parsing={
                "settings": {
                    "page_range": {
                        "start": split.pages[0],
                        "end": split.pages[-1]
                    }
                }
            }
        )
        print(f"{split.name}: {extract_result.result}")

Best Practices

Write Clear Descriptions

Detailed section descriptions improve classification accuracy.

Use Partition Keys

Use partition_key with a string identifier when sections repeat multiple times.

Next Steps