Deep Extract

What is Deep Extract?

Deep Extract is an agentic extraction mode that iteratively refines its output to achieve near-perfect accuracy. Unlike standard extraction which makes a single pass over the document, Deep Extract runs an agentic loop that verifies and corrects its results against the source document until a quality threshold is met. This is especially powerful for complex documents where a single extraction pass may miss values, misalign table rows, or produce inconsistencies. Deep Extract catches these issues by checking its own work and re-extracting until the results are accurate.

When to Use It

Deep Extract is designed for extractions where accuracy is critical and the cost of errors is high. Common use cases include:

Invoice line item extraction — Ensuring every line item is captured and that totals reconcile with the sum of individual amounts
Financial statement processing — Extracting balance sheets, income statements, or cash flow data where numbers must be internally consistent
Legal document extraction — Pulling clauses, dates, and party information from contracts where missing a single field has real consequences
Medical and insurance forms — Capturing patient data, procedure codes, and billing amounts that must be complete and correct
Multi-page tables — Documents with hundreds of rows spanning many pages where standard extraction may truncate or skip entries

If your extraction is simple (a few scalar fields from a short document), standard extraction is sufficient. Use Deep Extract when you need high reliability on complex or lengthy documents.

How to Use It

Enable Deep Extract by setting deep_extract to true in the settings object:

result = client.extract.run(
    input=upload.file_id,
    instructions={
        "schema": schema,
        "system_prompt": "Extract all line items from this invoice. Iterate until the line items sum up to the total listed in the document."
    },
    settings={
        "deep_extract": True
    }
)

const result = await client.extract.run({
  input: upload.file_id,
  instructions: {
    schema,
    system_prompt: 'Extract all line items from this invoice. Iterate until the line items sum up to the total listed in the document.'
  },
  settings: {
    deep_extract: true
  }
});

curl -X POST https://platform.reducto.ai/extract \
  -H "Authorization: Bearer $REDUCTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "reducto://your-file-id",
    "instructions": {
      "schema": {...},
      "system_prompt": "Extract all line items from this invoice. Iterate until the line items sum up to the total listed in the document."
    },
    "settings": {
      "deep_extract": true
    }
  }'

Best Practices

Add verification criteria to your system prompt

The agentic loop uses your system prompt to determine when extraction is “good enough.” Including explicit verification criteria gives the agent a concrete goal to iterate toward.

instructions={
    "schema": invoice_schema,
    "system_prompt": (
        "Extract all line items from this invoice. "
        "Iterate until the line items sum up to the total listed in the document."
    )
}

Other examples of effective verification criteria:

Financial documents: “Verify that the sum of all transaction amounts equals the statement total.”
Multi-page tables: “Ensure every row in the table is captured. The document states there are N entries — verify the count matches.”
Contracts: “Confirm that all parties listed in the signature block are captured in the parties array.”

Use with a well-defined schema

Deep Extract works best when your schema has clear field names and descriptions. The agent uses these to understand what it’s looking for during each iteration. See Extract Best Practices for schema design guidance.

Pair with Parse configuration

Deep Extract can only verify and refine what Parse sees. If the underlying parse output is missing data (e.g., a table isn’t detected), Deep Extract won’t be able to find it either. Consider enabling agentic mode for tables or using HTML table output for complex documents.

Extract Overview

Endpoint basics and parameters.

Best Practices

Schema design and prompt writing tips.

Citations

Link values to source locations.

Credit Usage

Understand how credits are calculated.

Get Started

Developer Tools

Core Functions

Workflows and Pipelines

Configurations

Reference

Components

Enterprise Resources

Security and privacy

On-premise Resources

What is Deep Extract?

When to Use It

How to Use It

Best Practices

Add verification criteria to your system prompt

Use with a well-defined schema

Pair with Parse configuration

Extract Overview

Best Practices

Citations

Credit Usage

​What is Deep Extract?

​When to Use It

​How to Use It

​Best Practices

​Add verification criteria to your system prompt

​Use with a well-defined schema

​Pair with Parse configuration

​Related

Extract Overview

Best Practices

Citations

Credit Usage

What is Deep Extract?

When to Use It

How to Use It

Best Practices

Add verification criteria to your system prompt

Use with a well-defined schema

Pair with Parse configuration

Related