Skip to main content
Reducto endpoints can be chained together to build multi-step document processing workflows. A common pattern is Classify first to determine document type, then Parse and Extract with the right configuration for that type. When you call Parse, Reducto returns a job_id that represents the parsed document. You can pass this job ID to subsequent Extract or Split calls using the jobid:// prefix, which skips re-parsing and uses the cached result. This saves both time and credits when you need to run multiple operations on the same document.

The jobid:// protocol

After parsing a document, the response includes a job_id:
from pathlib import Path

upload = client.upload(file=Path("document.pdf"))
parse_result = client.parse.run(input=upload)
print(parse_result.job_id)  # "7600c8c5-a52f-49d2-8a7d-d75d1b51e141"
To reuse this parsed content in Extract or Split, prefix the job ID with jobid://:
# Extract using the parsed document (no re-parsing)
extract_result = client.extract.run(
    input=f"jobid://{parse_result.job_id}",
    instructions={"schema": your_schema}
)
When Reducto sees jobid://, it retrieves the cached parse result instead of processing the document again. Any parsing options you include in the request are ignored since the document was already parsed.

Common chaining patterns

Parse → Extract

The most common pattern. Parse once, then run one or more extractions with different schemas:
from pathlib import Path
from reducto import Reducto

client = Reducto()

# Step 1: Upload and parse the document
upload = client.upload(file=Path("financial-report.pdf"))
parse_result = client.parse.run(input=upload)
job_id = parse_result.job_id

# Step 2: Extract summary metrics
summary = client.extract.run(
    input=f"jobid://{job_id}",
    instructions={"schema": {
        "type": "object",
        "properties": {
            "total_revenue": {"type": "number"},
            "net_income": {"type": "number"}
        }
    }}
)

# Step 3: Extract detailed line items (same parsed document)
line_items = client.extract.run(
    input=f"jobid://{job_id}",
    instructions={"schema": {
        "type": "object",
        "properties": {
            "expenses": {"type": "array", "items": {"type": "object"}}
        }
    }},
    settings={"array_extract": True}
)
Without chaining, each Extract call would re-parse the document. With chaining, you parse once and pay for parsing credits once.

Parse → Split → Extract

For documents with distinct sections that need different extraction schemas:
from pathlib import Path

# Step 1: Upload and parse
upload = client.upload(file=Path("contract.pdf"))
parse_result = client.parse.run(input=upload)
job_id = parse_result.job_id

# Step 2: Split into sections
split_result = client.split.run(
    input=f"jobid://{job_id}",
    split_description=[
        {"name": "Terms", "description": "Terms and conditions section"},
        {"name": "Pricing", "description": "Pricing and payment terms"},
        {"name": "SLA", "description": "Service level agreement"}
    ]
)

# Step 3: Extract from specific sections
for section in split_result.result.splits:
    if section.pages:
        extract_result = client.extract.run(
            input=f"jobid://{job_id}",
            instructions={"schema": get_schema_for_section(section.name)},
            parsing={"settings": {"page_range": {
                "start": section.pages[0],
                "end": section.pages[-1]
            }}}
        )

Classify → Parse → Extract

When you need to determine document type before choosing an extraction schema, use Classify first.
from reducto import Reducto

client = Reducto()

# Step 1: Classify the document type
classification = client.classify.run(
    input=document_url,
    classification_schema=[
        {
            "category": "invoice",
            "criteria": ["billing information", "itemized charges", "payment details"],
        },
        {
            "category": "receipt",
            "criteria": ["single transaction", "store or merchant name", "payment method"],
        },
        {
            "category": "purchase_order",
            "criteria": ["order number", "requested items", "delivery instructions"],
        },
    ],
)

doc_type = classification.result.category

# Step 2: Parse the document
parse_result = client.parse.run(input=document_url)
job_id = parse_result.job_id

# Step 3: Extract with the right schema for this document type
if doc_type == "invoice":
    schema = invoice_schema
elif doc_type == "receipt":
    schema = receipt_schema
else:
    schema = purchase_order_schema

result = client.extract.run(
    input=f"jobid://{job_id}",
    instructions={"schema": schema}
)
This pattern is useful when processing mixed document types from a single upload queue. Classify is purpose-built for document routing and costs only 0.5 credits per page of context, compared to using Extract as a workaround.

Multiple job IDs

Extract also accepts a list of job IDs, which combines the parsed content from multiple documents into a single extraction context:
# Parse multiple documents (documents can be URLs or uploaded file IDs)
job_ids = []
for doc in documents:
    result = client.parse.run(input=doc)
    job_ids.append(result.job_id)

# Extract across all documents
combined_result = client.extract.run(
    input=[f"jobid://{jid}" for jid in job_ids],
    instructions={"schema": aggregation_schema}
)
This behaves like multi-document pipelines: the extraction sees all documents together and returns a single result. Design your schema accordingly if you want data from each document.

Supported endpoints

EndpointAccepts jobid://Notes
ParseYesReprocesses with different settings
ExtractYesSingle ID or list of IDs
SplitYesSingle ID only
ClassifyNoAccepts URLs and upload responses
EditNoRequires actual document URL

Credit savings

When you use jobid://, you only pay parse credits once regardless of how many subsequent calls you make:
Without chainingWith chaining
Parse (4 credits)Parse (4 credits)
Extract #1 (4 + 2 credits)Extract #1 (2 credits)
Extract #2 (4 + 2 credits)Extract #2 (2 credits)
Total: 16 creditsTotal: 8 credits
The savings scale with document size and number of operations.

Job ID retention

Parse job IDs are retained for 12 hours by default. If you need to chain calls after this window, you’ll need to re-parse the document. For workflows that span longer periods, consider storing the parsed content or using pipelines which handle this automatically.

Classify

Categorize documents by type before processing.

Pipeline Basics

Bundle multi-step workflows into a single API call.

Split

Divide documents into sections for targeted extraction.

Extract

Pull structured data from parsed documents.