- To classify documents before performing extraction, see our pipelining documentation.
- To split up long documents into subsets, see our splitting endpoint.
Example use cases
- Extracting important numbers and statistics on a patient lab report.
- Extracting the rows and line items inside of an invoice.
- Extracting key clauses and prices inside of a contract.
Key features
Under the hood, an extract call first performs a /parse and then extracts your specified fields.schema
: A JSON schema that details the specific fields and structure of your output.system_prompt
: An overall system prompt, that helps our models understand your document structure better.- Special Configurations:
array_extract
andgenerate_citations
Example
Let’s say you’re looking to extract all the financial accounts under a customer off of a statement. You can see the output in our playground example, but your schema and code might look like this:Troubleshooting
Common problems and their solutions:-
Outputs differ between runs
LLM outputs are non-deterministic. Variations are normal and usually minor. -
Only the first pages are processed
Enablearray_extract
when working with long documents or large tables. You can also guide the model with a system prompt such as:
“Make sure to process the entire document, not just the beginning.” -
Missing values from schema
- Check whether the values appear in the parse step output.
- If present, refine the system prompt or add better field descriptions.
- If not present, improve the parse by adjusting configurations (e.g., OCR mode, layout settings).
If you are extracting from a long list or table, try using the
array
type, or doingarray_extract
.
Advanced usage
-
Different schemas by document type
Use pipelining to classify documents first, then apply the correct schema. This also avoids duplicate parsing. -
Handling edge cases
Use yoursystem_prompt
to highlight special cases. For example:
“Pay close attention to clauses that mention early termination fees.” -
Conditional transformations
Both thesystem_prompt
and schema field descriptions can include transformation instructions. For example:
“Prepend [Contains Example] if keyword X appears in this field.” However, do not include post-processing logic, as Reducto can only extract what is on the document.