Documentation Index Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
Use this file to discover all available pages before exploring further.
The split.run() method divides documents into sections based on descriptions you provide. You define what sections to look for, and Split identifies which pages belong to each section.
Basic Usage
from pathlib import Path
from reducto import Reducto
client = Reducto()
# Upload
upload = client.upload( file = Path( "document.pdf" ))
# Split the document - split_description is required
result = client.split.run(
input = upload.file_id,
split_description = [
{ "name" : "Summary" , "description" : "Executive summary or overview section" },
{ "name" : "Financial Data" , "description" : "Tables with financial figures" },
{ "name" : "Notes" , "description" : "Footnotes or additional notes" }
]
)
# Access splits
for split in result.result.splits:
print ( f "Section: { split.name } " )
print ( f "Pages: { split.pages } " )
print ( f "Confidence: { split.conf } " )
Method Signature
def split.run(
input : str ,
split_description: list[ dict ],
parsing: dict | None = None ,
settings: dict | None = None ,
split_rules: str | None = None
) -> SplitResponse
Parameters
Parameter Type Required Description inputstrYes File ID (reducto://...), URL, or jobid:// reference split_descriptionlist[dict]Yes List of sections to identify, each with name, description, and optional partition_key parsingdict | NoneNo Parse configuration (page range, OCR settings) settingsdict | NoneNo Split settings (e.g., table_cutoff) split_rulesstr | NoneNo Natural language prompt describing rules for splitting
Split Description
The split_description parameter is required. Each entry defines a section to find:
split_description = [
{
"name" : "Cover Page" ,
"description" : "Title page with company logo and report title"
},
{
"name" : "Table of Contents" ,
"description" : "Page listing all sections with page numbers"
},
{
"name" : "Financial Statements" ,
"description" : "Balance sheet, income statement, and cash flow tables"
}
]
result = client.split.run(
input = upload.file_id,
split_description = split_description
)
With Partition Key
Use partition_key when a section type repeats multiple times and you want to group by a specific identifier:
split_description = [
{
"name" : "Account Holdings" ,
"description" : "Investment holdings for a specific account" ,
"partition_key" : "account_number" # Group pages by account number
}
]
The partition_key is a string describing what identifier to look for (e.g., “account number”, “patient ID”, “invoice number”). Split will find all instances of that section and group them by the identifier value it finds in the document.
Split Rules
The split_rules parameter is a natural language prompt that controls how pages are classified. The default rule allows pages to belong to multiple sections only at boundaries:
# Allow pages to belong to multiple sections
result = client.split.run(
input = upload.file_id,
split_description = [ ... ],
split_rules = "Pages can belong to multiple sections if they contain content from both."
)
# Force exclusive classification
result = client.split.run(
input = upload.file_id,
split_description = [ ... ],
split_rules = "Each page must belong to exactly one section. Choose the most relevant section."
)
Parsing Configuration
Configure how the document is parsed before splitting:
result = client.split.run(
input = upload.file_id,
split_description = [ ... ],
parsing = {
"settings" : {
"page_range" : { "start" : 1 , "end" : 20 }
}
}
)
Response Structure
result: SplitResponse = client.split.run( ... )
# Top-level fields
print (result.usage.num_pages) # int: Pages processed
print (result.usage.credits) # float: Credits used
# Splits
for split in result.result.splits:
print (split.name) # str: Section name (from split_description)
print (split.pages) # list[int]: Page numbers in this section (1-indexed)
print (split.conf) # str: Confidence level ("high" or "low")
print (split.partitions) # list | None: Sub-sections when partition_key is used
Split Object
Each split contains:
name (str): The section name you defined
pages (list[int]): Page numbers belonging to this section (1-indexed)
conf (str): Confidence level ("high" or "low")
partitions (list | None): When using partition_key, contains sub-sections with their own name, pages, and conf
Error Handling
from reducto import Reducto
import reducto
try :
result = client.split.run(
input = upload.file_id,
split_description = [{ "name" : "Summary" , "description" : "..." }]
)
except reducto.APIConnectionError as e:
print ( f "Connection failed: { e } " )
except reducto.APIStatusError as e:
print ( f "Split failed: { e.status_code } - { e.response } " )
Complete Example
from pathlib import Path
from reducto import Reducto
client = Reducto()
# Upload
upload = client.upload( file = Path( "fidelity-example.pdf" ))
# Define sections to find
split_description = [
{
"name" : "Account Summary" ,
"description" : "Overview of account balances and holdings"
},
{
"name" : "Holdings Detail" ,
"description" : "Detailed list of individual holdings with values"
},
{
"name" : "Transaction History" ,
"description" : "Recent transactions and activity"
}
]
# Split the document
result = client.split.run(
input = upload.file_id,
split_description = split_description
)
# Process results
print ( f "Found { len (result.result.splits) } sections" )
for split in result.result.splits:
print ( f " \n === { split.name } ===" )
print ( f "Pages: { split.pages } " )
print ( f "Confidence: { split.conf } " )
A common pattern is to split a document then extract different schemas from each section:
# Split first
split_result = client.split.run(
input = upload.file_id,
split_description = [
{ "name" : "Summary" , "description" : "Account summary" },
{ "name" : "Holdings" , "description" : "Holdings table" }
]
)
# Extract from each section using page ranges
for split in split_result.result.splits:
if split.pages:
extract_result = client.extract.run(
input = f "jobid:// { split_result.job_id } " ,
instructions = { "schema" : get_schema_for(split.name)},
parsing = {
"settings" : {
"page_range" : {
"start" : split.pages[ 0 ],
"end" : split.pages[ - 1 ]
}
}
}
)
print ( f " { split.name } : { extract_result.result } " )
Best Practices
Write Clear Descriptions Detailed section descriptions improve classification accuracy.
Use Partition Keys Use partition_key with a string identifier when sections repeat multiple times.
Next Steps