> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Multi-document Pipelines

> Process related documents together with shared context for extraction

Multi-document pipelines let you pass several documents to one pipeline call. Reducto parses each document in parallel, combines the parsed content, then runs a single extraction across all of them. This is useful when related information spans multiple files, such as a contract split across several PDFs or supporting documents that reference each other.

## How it works

Instead of passing a single document URL, you pass a list:

<CodeGroup>
  ```python Python theme={null}
  result = client.pipeline.run(
      input=[upload1.file_id, upload2.file_id, upload3.file_id],
      pipeline_id="your_pipeline"
  )
  ```

  ```typescript TypeScript theme={null}
  const result = await client.pipeline.run({
      input: [upload1.file_id, upload2.file_id, upload3.file_id],
      pipeline_id: "your_pipeline"
  });
  ```

  ```bash cURL theme={null}
  curl -X POST "https://platform.reducto.ai/pipeline" \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": [
        "reducto://file1.pdf",
        "reducto://file2.pdf",
        "reducto://file3.pdf"
      ],
      "pipeline_id": "your_pipeline"
    }'
  ```

  ```go Go theme={null}
  // Go SDK does not yet support pipelines; use HTTP directly
  payload := map[string]interface{}{
      "input":       []string{"reducto://file1.pdf", "reducto://file2.pdf", "reducto://file3.pdf"},
      "pipeline_id": "your_pipeline",
  }
  body, _ := json.Marshal(payload)

  req, _ := http.NewRequest("POST", "https://platform.reducto.ai/pipeline", bytes.NewReader(body))
  req.Header.Set("Authorization", "Bearer "+os.Getenv("REDUCTO_API_KEY"))
  req.Header.Set("Content-Type", "application/json")

  resp, _ := http.DefaultClient.Do(req)
  defer resp.Body.Close()
  ```
</CodeGroup>

Reducto then:

1. Parses all documents in parallel
2. Combines the parsed content into a single context
3. Runs extraction once across the combined content
4. Returns parse results as an array (one per document) and extract as a single result

## Understanding the response

The key difference from single-document pipelines is that `result.parse` becomes an array:

```json theme={null}
{
  "job_id": "pipeline-abc123",
  "usage": {"num_pages": 3, "credits": 8.0},
  "result": {
    "parse": [
      {"job_id": "parse-001", "result": {...}, "usage": {"num_pages": 1}},
      {"job_id": "parse-002", "result": {...}, "usage": {"num_pages": 1}},
      {"job_id": "parse-003", "result": {...}, "usage": {"num_pages": 1}}
    ],
    "extract": {
      "job_id": "extract-combined",
      "result": {
        "fieldName": {"value": "...", "citations": [...]}
      }
    }
  }
}
```

The parse array maintains the same order as your input documents. The extract result is a single object representing one extraction across all combined content.

## When to use multi-document pipelines

Multi-document pipelines work best when your documents are related and the extraction benefits from seeing them together:

| Good use cases                                  | Why it works                                                            |
| ----------------------------------------------- | ----------------------------------------------------------------------- |
| Contract with exhibits split into separate PDFs | Extract can reference terms from main contract when processing exhibits |
| Multi-page report scanned as individual files   | Reassemble logical document from physical pages                         |
| Application with supporting documents           | Extract can cross-reference between application form and attachments    |

<Warning>
  Multi-document pipelines are not the same as batch processing. If you have independent documents that don't relate to each other (like 100 different customer invoices), use [batch processing](/workflows/batch-processing) instead. Batch processing runs your pipeline independently on each document.
</Warning>

## Schema design matters

Because extraction runs once across combined content, your schema determines what you get back. Consider three invoices with amounts $1,000, $2,000, and \$3,000:

**Singular field schema:**

```json theme={null}
{"total_amount": "number"}
```

Result: `{"total_amount": 1000}` — picks one value, not a sum

**Aggregate field schema:**

```json theme={null}
{"total_spend": "The sum of all invoice amounts across all documents"}
```

Result: `{"total_spend": 6000}` — LLM computes the aggregate

**Array field schema:**

```json theme={null}
{"invoices": [{"invoice_number": "string", "amount": "number"}]}
```

Result: `{"invoices": [{"invoice_number": "A001", "amount": 1000}, ...]}` — extracts from each

The LLM sees all documents together, so it can answer questions that span them. But you need to design your schema to ask the right questions.

## Requirements and limitations

**Requirements:**

* Pipeline must include an Extract step
* At least one document required
* All documents must be accessible URLs or uploaded files

**Limitations:**

* Split is not supported with multi-document pipelines
* Only one Extract step is supported
* Edit pipelines don't support multi-document input

## Credits

Multi-document pipelines bill for:

* Parse credits for each document (based on page count)
* Extract credits once for the combined extraction

If you need to run the same extraction independently on many documents, batch processing is more appropriate and gives you separate results per document.

***

## Related

<CardGroup cols={2}>
  <Card title="Pipeline Basics" href="/workflows/pipeline-basics">
    Understanding pipeline patterns and response structures.
  </Card>

  <Card title="Batch Processing" href="/workflows/batch-processing">
    Process many independent documents in parallel.
  </Card>

  <Card title="Array Extraction" href="/configs/extract/array-extraction">
    Extract arrays of items from documents.
  </Card>

  <Card title="Pipeline API Reference" href="/api-reference/pipeline">
    Full API documentation for pipeline calls.
  </Card>
</CardGroup>