Skip to main content
Deep Extract is currently in beta. Configuration options and pricing are subject to change.

What is Deep Extract?

Deep Extract is an agentic extraction mode that iteratively refines its output to achieve near-perfect accuracy. Unlike standard extraction which makes a single pass over the document, Deep Extract runs an agentic loop that verifies and corrects its results against the source document until a quality threshold is met. This is especially powerful for complex documents where a single extraction pass may miss values, misalign table rows, or produce inconsistencies. Deep Extract catches these issues by checking its own work and re-extracting until the results are accurate.

When to Use It

Deep Extract is designed for extractions where accuracy is critical and the cost of errors is high. Common use cases include:
  • Invoice line item extraction — Ensuring every line item is captured and that totals reconcile with the sum of individual amounts
  • Financial statement processing — Extracting balance sheets, income statements, or cash flow data where numbers must be internally consistent
  • Legal document extraction — Pulling clauses, dates, and party information from contracts where missing a single field has real consequences
  • Medical and insurance forms — Capturing patient data, procedure codes, and billing amounts that must be complete and correct
  • Multi-page tables — Documents with hundreds of rows spanning many pages where standard extraction may truncate or skip entries
If your extraction is simple (a few scalar fields from a short document), standard extraction is sufficient. Use Deep Extract when you need high reliability on complex or lengthy documents.

How to Use It

Enable Deep Extract by setting alpha.deep_extract to true in the settings object:
result = client.extract.run(
    input=upload.file_id,
    instructions={
        "schema": schema,
        "system_prompt": "Extract all line items from this invoice. Iterate until the line items sum up to the total listed in the document."
    },
    settings={
        "alpha": {
            "deep_extract": True
        }
    }
)

Best Practices

Add verification criteria to your system prompt

The agentic loop uses your system prompt to determine when extraction is “good enough.” Including explicit verification criteria gives the agent a concrete goal to iterate toward.
instructions={
    "schema": invoice_schema,
    "system_prompt": (
        "Extract all line items from this invoice. "
        "Iterate until the line items sum up to the total listed in the document."
    )
}
Other examples of effective verification criteria:
  • Financial documents: “Verify that the sum of all transaction amounts equals the statement total.”
  • Multi-page tables: “Ensure every row in the table is captured. The document states there are N entries — verify the count matches.”
  • Contracts: “Confirm that all parties listed in the signature block are captured in the parties array.”

Use with a well-defined schema

Deep Extract works best when your schema has clear field names and descriptions. The agent uses these to understand what it’s looking for during each iteration. See Extract Best Practices for schema design guidance.

Pair with Parse configuration

Deep Extract can only verify and refine what Parse sees. If the underlying parse output is missing data (e.g., a table isn’t detected), Deep Extract won’t be able to find it either. Consider enabling agentic mode for tables or using HTML table output for complex documents.