Skip to main content
When you call Parse, Reducto returns a job_id that represents the parsed document. You can pass this job ID to subsequent Extract or Split calls using the jobid:// prefix, which skips re-parsing and uses the cached result. This saves both time and credits when you need to run multiple operations on the same document.

The jobid:// protocol

After parsing a document, the response includes a job_id:
from pathlib import Path

upload = client.upload(file=Path("document.pdf"))
parse_result = client.parse.run(input=upload)
print(parse_result.job_id)  # "7600c8c5-a52f-49d2-8a7d-d75d1b51e141"
To reuse this parsed content in Extract or Split, prefix the job ID with jobid://:
# Extract using the parsed document (no re-parsing)
extract_result = client.extract.run(
    input=f"jobid://{parse_result.job_id}",
    instructions={"schema": your_schema}
)
When Reducto sees jobid://, it retrieves the cached parse result instead of processing the document again. Any parsing options you include in the request are ignored since the document was already parsed.

Common chaining patterns

Parse → Extract

The most common pattern. Parse once, then run one or more extractions with different schemas:
from pathlib import Path
from reducto import Reducto

client = Reducto()

# Step 1: Upload and parse the document
upload = client.upload(file=Path("financial-report.pdf"))
parse_result = client.parse.run(input=upload)
job_id = parse_result.job_id

# Step 2: Extract summary metrics
summary = client.extract.run(
    input=f"jobid://{job_id}",
    instructions={"schema": {
        "type": "object",
        "properties": {
            "total_revenue": {"type": "number"},
            "net_income": {"type": "number"}
        }
    }}
)

# Step 3: Extract detailed line items (same parsed document)
line_items = client.extract.run(
    input=f"jobid://{job_id}",
    instructions={"schema": {
        "type": "object",
        "properties": {
            "expenses": {"type": "array", "items": {"type": "object"}}
        }
    }},
    settings={"array_extract": True}
)
Without chaining, each Extract call would re-parse the document. With chaining, you parse once and pay for parsing credits once.

Parse → Split → Extract

For documents with distinct sections that need different extraction schemas:
from pathlib import Path

# Step 1: Upload and parse
upload = client.upload(file=Path("contract.pdf"))
parse_result = client.parse.run(input=upload)
job_id = parse_result.job_id

# Step 2: Split into sections
split_result = client.split.run(
    input=f"jobid://{job_id}",
    split_description=[
        {"name": "Terms", "description": "Terms and conditions section"},
        {"name": "Pricing", "description": "Pricing and payment terms"},
        {"name": "SLA", "description": "Service level agreement"}
    ]
)

# Step 3: Extract from specific sections
for section in split_result.result.splits:
    if section.pages:
        extract_result = client.extract.run(
            input=f"jobid://{job_id}",
            instructions={"schema": get_schema_for_section(section.name)},
            parsing={"settings": {"page_range": {
                "start": section.pages[0],
                "end": section.pages[-1]
            }}}
        )

Parse → Classify → Extract

When you need to determine document type before choosing an extraction schema:
# Step 1: Parse
parse_result = client.parse.run(input=document_url)
job_id = parse_result.job_id

# Step 2: Classify document type
classification = client.extract.run(
    input=f"jobid://{job_id}",
    instructions={"schema": {
        "type": "object",
        "properties": {
            "document_type": {"type": "string", "enum": ["Invoice", "Receipt", "PO", "Other"]}
        }
    }}
)

# Extract result is a list; access the first item
doc_type = classification.result[0]["document_type"]

# Step 3: Extract with type-specific schema
if doc_type == "Invoice":
    schema = invoice_schema
elif doc_type == "Receipt":
    schema = receipt_schema
else:
    schema = generic_schema

result = client.extract.run(
    input=f"jobid://{job_id}",
    instructions={"schema": schema}
)
This pattern is useful when processing mixed document types from a single upload queue.

Multiple job IDs

Extract also accepts a list of job IDs, which combines the parsed content from multiple documents into a single extraction context:
# Parse multiple documents (documents can be URLs or uploaded file IDs)
job_ids = []
for doc in documents:
    result = client.parse.run(input=doc)
    job_ids.append(result.job_id)

# Extract across all documents
combined_result = client.extract.run(
    input=[f"jobid://{jid}" for jid in job_ids],
    instructions={"schema": aggregation_schema}
)
This behaves like multi-document pipelines: the extraction sees all documents together and returns a single result. Design your schema accordingly if you want data from each document.

Supported endpoints

EndpointAccepts jobid://Notes
ParseYesReprocesses with different settings
ExtractYesSingle ID or list of IDs
SplitYesSingle ID only
EditNoRequires actual document URL

Credit savings

When you use jobid://, you only pay parse credits once regardless of how many subsequent calls you make:
Without chainingWith chaining
Parse (4 credits)Parse (4 credits)
Extract #1 (4 + 2 credits)Extract #1 (2 credits)
Extract #2 (4 + 2 credits)Extract #2 (2 credits)
Total: 16 creditsTotal: 8 credits
The savings scale with document size and number of operations.

Job ID retention

Parse job IDs are retained for 12 hours by default. If you need to chain calls after this window, you’ll need to re-parse the document. For workflows that span longer periods, consider storing the parsed content or using pipelines which handle this automatically.