Multi-document Pipelines

Multi-document pipelines let you pass several documents to one pipeline call. Reducto parses each document in parallel, combines the parsed content, then runs a single extraction across all of them. This is useful when related information spans multiple files, such as a contract split across several PDFs or supporting documents that reference each other.

How it works

Instead of passing a single document URL, you pass a list:

result = client.pipeline.run(
    input=[upload1.file_id, upload2.file_id, upload3.file_id],
    pipeline_id="your_pipeline"
)

Reducto then:

Parses all documents in parallel
Combines the parsed content into a single context
Runs extraction once across the combined content
Returns parse results as an array (one per document) and extract as a single result

Understanding the response

The key difference from single-document pipelines is that result.parse becomes an array:

{
  "job_id": "pipeline-abc123",
  "usage": {"num_pages": 3, "credits": 8.0},
  "result": {
    "parse": [
      {"job_id": "parse-001", "result": {...}, "usage": {"num_pages": 1}},
      {"job_id": "parse-002", "result": {...}, "usage": {"num_pages": 1}},
      {"job_id": "parse-003", "result": {...}, "usage": {"num_pages": 1}}
    ],
    "extract": {
      "job_id": "extract-combined",
      "result": {
        "fieldName": {"value": "...", "citations": [...]}
      }
    }
  }
}

The parse array maintains the same order as your input documents. The extract result is a single object representing one extraction across all combined content.

When to use multi-document pipelines

Multi-document pipelines work best when your documents are related and the extraction benefits from seeing them together:

Good use cases	Why it works
Contract with exhibits split into separate PDFs	Extract can reference terms from main contract when processing exhibits
Multi-page report scanned as individual files	Reassemble logical document from physical pages
Application with supporting documents	Extract can cross-reference between application form and attachments

Multi-document pipelines are not the same as batch processing. If you have independent documents that don’t relate to each other (like 100 different customer invoices), use batch processing instead. Batch processing runs your pipeline independently on each document.

Schema design matters

Because extraction runs once across combined content, your schema determines what you get back. Consider three invoices with amounts

1,000,

2,000, and $3,000: Singular field schema:

{"total_amount": "number"}

Result: {"total_amount": 1000} — picks one value, not a sum Aggregate field schema:

{"total_spend": "The sum of all invoice amounts across all documents"}

Result: {"total_spend": 6000} — LLM computes the aggregate Array field schema:

{"invoices": [{"invoice_number": "string", "amount": "number"}]}

Result: {"invoices": [{"invoice_number": "A001", "amount": 1000}, ...]} — extracts from each The LLM sees all documents together, so it can answer questions that span them. But you need to design your schema to ask the right questions.

Requirements and limitations

Requirements:

Pipeline must include an Extract step
At least one document required
All documents must be accessible URLs or uploaded files

Limitations:

Split is not supported with multi-document pipelines
Only one Extract step is supported
Edit pipelines don’t support multi-document input

Credits

Multi-document pipelines bill for:

Parse credits for each document (based on page count)
Extract credits once for the combined extraction

If you need to run the same extraction independently on many documents, batch processing is more appropriate and gives you separate results per document.

Pipeline Basics

Understanding pipeline patterns and response structures.

Batch Processing

Process many independent documents in parallel.

Array Extraction

Extract arrays of items from documents.

Pipeline API Reference

Full API documentation for pipeline calls.

Get Started

Core Functions

Workflows and Pipelines

Configurations

Reference

Components

Enterprise Resources

Security and privacy

On-premise Resources

Multi-document Pipelines

How it works

Understanding the response

When to use multi-document pipelines

Schema design matters

Requirements and limitations

Credits

Pipeline Basics

Batch Processing

Array Extraction

Pipeline API Reference

Get Started

Core Functions

Workflows and Pipelines

Configurations

Reference

Components

Enterprise Resources

Security and privacy

On-premise Resources

​How it works

​Understanding the response

​When to use multi-document pipelines

​Schema design matters

​Requirements and limitations

​Credits

​Related

Pipeline Basics

Batch Processing

Array Extraction

Pipeline API Reference

How it works

Understanding the response

When to use multi-document pipelines

Schema design matters

Requirements and limitations

Credits

Related