Documentation Index
Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Split is an endpoint designed for long documents with multiple sections that require different treatments or extraction schemas. It uses a natural language prompt to intelligently divide a document into meaningful page ranges. This step is typically used before extraction or after parsing to make the process more targeted and effective. Example Use Cases- Financial 10-K Report with sections like “Business Overview,” “Financial Data,” and “Corporate Governance”.
- Multi-page medical record book with intake packets of several patients.
- Employee onboarding packet with sections like policies, benefits, tax forms, and more.
Key features
Under the hood, split runs a parsing call followed by LLM calls to determine which pages match your inputted description. Key Inputs:-
split_description
- name: Name of the category of pages you’re trying to extract
- description: Details describing the characteristics of the pages you want
- partition_key (optional): Identifier for splitting sections by. In the output the partition_key will be filled in with values from the content and appended after the split_description name
- Examples: account number, index, name, holding type. If your partition key is
account numberand the split description name isaccount holdings, then the output mappings will beaccount holdings 111,account holdings 112, etc.
- Examples: account number, index, name, holding type. If your partition key is
-
split_rules
- Optional field for changing the default rules
- Example: Each page can only be classified into a single category
- Optional field for changing the default rules
- splits
- List of objects with
name,pages,conf, andpartitionsfieldsname: Comes from the names that are filled into for split_descriptionpages: Pages for this categoryconf: Confidence for the categorization, either “low” or “high”partitions: Either a list of objects withname,conf, andpagesor null ifpartition_keyis not definedname: The value filled in from the contentpages: List of pages for this partitionconf: Confidence for the categorization of this partition, either “low” or “high”
- List of objects with
Pipelining with Split
Split is powerful when used in conjunction with other endpoints, specifically extraction. When you have a long document with each subsection needing a different schema, splitting helps you break up the document into logically grouped subsets. This process is known as pipelining.Sample scenarios
Scenario 1:
You have a long document and the information that you plan to Extract is only on some subset of the pages.
- Run /parse on the file and get the parse job id.
- Run /split on the parse output (feed in the parse job id).
Legacy Output
Legacy Output
-
Run /extract on the parse output using the same document_url you sent into split. Set the page range as the pages returned from split:
Scenario 2:
You have a long document with information that can be clearly broken down into subsections.
An example of this would be a financial statement with multiple accounts. Each account will have a unique account number. Consider the following schema:Legacy Output
Legacy Output