> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Deep Extract

> Achieve near-perfect accuracy on complex and long extractions with our agentic loop

## What is Deep Extract?

Deep Extract is an agentic extraction mode that iteratively refines its output to achieve near-perfect accuracy. Unlike standard extraction which makes a single pass over the document, Deep Extract runs an agentic loop that verifies and corrects its results against the source document until a quality threshold is met.

This is especially powerful for complex documents where a single extraction pass may miss values, misalign table rows, or produce inconsistencies. Deep Extract catches these issues by checking its own work and re-extracting until the results are accurate.

***

## When to Use It

Deep Extract is designed for extractions where accuracy is critical and the cost of errors is high. Common use cases include:

* **Invoice line item extraction** — Ensuring every line item is captured and that totals reconcile with the sum of individual amounts
* **Financial statement processing** — Extracting balance sheets, income statements, or cash flow data where numbers must be internally consistent
* **Legal document extraction** — Pulling clauses, dates, and party information from contracts where missing a single field has real consequences
* **Medical and insurance forms** — Capturing patient data, procedure codes, and billing amounts that must be complete and correct
* **Multi-page tables** — Documents with hundreds of rows spanning many pages where standard extraction may truncate or skip entries

If your extraction is simple (a few scalar fields from a short document), standard extraction is sufficient. Use Deep Extract when you need high reliability on complex or lengthy documents.

***

## How to Use It

Enable Deep Extract by setting `deep_extract` to `true` in the `settings` object:

<CodeGroup>
  ```python Python theme={null}
  result = client.extract.run(
      input=upload.file_id,
      instructions={
          "schema": schema,
          "system_prompt": "Extract all line items from this invoice. Iterate until the line items sum up to the total listed in the document."
      },
      settings={
          "deep_extract": True
      }
  )
  ```

  ```javascript Node.js theme={null}
  const result = await client.extract.run({
    input: upload.file_id,
    instructions: {
      schema,
      system_prompt: 'Extract all line items from this invoice. Iterate until the line items sum up to the total listed in the document.'
    },
    settings: {
      deep_extract: true
    }
  });
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/extract \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "reducto://your-file-id",
      "instructions": {
        "schema": {...},
        "system_prompt": "Extract all line items from this invoice. Iterate until the line items sum up to the total listed in the document."
      },
      "settings": {
        "deep_extract": true
      }
    }'
  ```
</CodeGroup>

***

## Best Practices

### Add verification criteria to your system prompt

The agentic loop uses your system prompt to determine when extraction is "good enough." Including explicit verification criteria gives the agent a concrete goal to iterate toward.

```python theme={null}
instructions={
    "schema": invoice_schema,
    "system_prompt": (
        "Extract all line items from this invoice. "
        "Iterate until the line items sum up to the total listed in the document."
    )
}
```

Other examples of effective verification criteria:

* **Financial documents:** "Verify that the sum of all transaction amounts equals the statement total."
* **Multi-page tables:** "Ensure every row in the table is captured. The document states there are N entries — verify the count matches."
* **Contracts:** "Confirm that all parties listed in the signature block are captured in the parties array."

### Use with a well-defined schema

Deep Extract works best when your schema has clear field names and descriptions. The agent uses these to understand what it's looking for during each iteration. See [Extract Best Practices](/extraction/best-practices-extract) for schema design guidance.

### Pair with Parse configuration

Deep Extract can only verify and refine what Parse sees. If the underlying parse output is missing data (e.g., a table isn't detected), Deep Extract won't be able to find it either. Consider enabling [agentic mode](/configs/parse/agentic-modes) for tables or using [HTML table output](/configs/parse/table-output-formats) for complex documents.

***

## Related

<CardGroup cols={2}>
  <Card title="Extract Overview" icon="file-export" href="/extract/overview">
    Endpoint basics and parameters.
  </Card>

  <Card title="Best Practices" icon="lightbulb" href="/extraction/best-practices-extract">
    Schema design and prompt writing tips.
  </Card>

  <Card title="Citations" icon="quote-left" href="/configs/extract/citations">
    Link values to source locations.
  </Card>

  <Card title="Credit Usage" icon="coins" href="/reference/credit-usage">
    Understand how credits are calculated.
  </Card>
</CardGroup>
