Skip to main content
Multi-document pipelines let you pass several documents to one pipeline call. Reducto parses each document in parallel, combines the parsed content, then runs a single extraction across all of them. This is useful when related information spans multiple files, such as a contract split across several PDFs or supporting documents that reference each other.

How it works

Instead of passing a single document URL, you pass a list:
result = client.pipeline.run(
    input=[upload1.file_id, upload2.file_id, upload3.file_id],
    pipeline_id="your_pipeline"
)
Reducto then:
  1. Parses all documents in parallel
  2. Combines the parsed content into a single context
  3. Runs extraction once across the combined content
  4. Returns parse results as an array (one per document) and extract as a single result

Understanding the response

The key difference from single-document pipelines is that result.parse becomes an array:
{
  "job_id": "pipeline-abc123",
  "usage": {"num_pages": 3, "credits": 8.0},
  "result": {
    "parse": [
      {"job_id": "parse-001", "result": {...}, "usage": {"num_pages": 1}},
      {"job_id": "parse-002", "result": {...}, "usage": {"num_pages": 1}},
      {"job_id": "parse-003", "result": {...}, "usage": {"num_pages": 1}}
    ],
    "extract": {
      "job_id": "extract-combined",
      "result": {
        "fieldName": {"value": "...", "citations": [...]}
      }
    }
  }
}
The parse array maintains the same order as your input documents. The extract result is a single object representing one extraction across all combined content.

When to use multi-document pipelines

Multi-document pipelines work best when your documents are related and the extraction benefits from seeing them together:
Good use casesWhy it works
Contract with exhibits split into separate PDFsExtract can reference terms from main contract when processing exhibits
Multi-page report scanned as individual filesReassemble logical document from physical pages
Application with supporting documentsExtract can cross-reference between application form and attachments
Multi-document pipelines are not the same as batch processing. If you have independent documents that don’t relate to each other (like 100 different customer invoices), use batch processing instead. Batch processing runs your pipeline independently on each document.

Schema design matters

Because extraction runs once across combined content, your schema determines what you get back. Consider three invoices with amounts 1,000,1,000, 2,000, and $3,000: Singular field schema:
{"total_amount": "number"}
Result: {"total_amount": 1000} β€” picks one value, not a sum Aggregate field schema:
{"total_spend": "The sum of all invoice amounts across all documents"}
Result: {"total_spend": 6000} β€” LLM computes the aggregate Array field schema:
{"invoices": [{"invoice_number": "string", "amount": "number"}]}
Result: {"invoices": [{"invoice_number": "A001", "amount": 1000}, ...]} β€” extracts from each The LLM sees all documents together, so it can answer questions that span them. But you need to design your schema to ask the right questions.

Requirements and limitations

Requirements:
  • Pipeline must include an Extract step
  • At least one document required
  • All documents must be accessible URLs or uploaded files
Limitations:
  • Split is not supported with multi-document pipelines
  • Only one Extract step is supported
  • Edit pipelines don’t support multi-document input

Credits

Multi-document pipelines bill for:
  • Parse credits for each document (based on page count)
  • Extract credits once for the combined extraction
If you need to run the same extraction independently on many documents, batch processing is more appropriate and gives you separate results per document.