Extract
Array Extraction
Extract large amounts of data from a document by breaking it into segments and running extraction on each segment.
Array Extraction is a type of extraction that is great for:
- Long documents with fields across many pages
- Dense/complex tables with many line items
In these cases we’d want to enable our array_extract
functionality, which can help you return ALL of the information you need without missing pages.
Experimenting with your system prompt can also help with longer documents.
Under the hood, array_extract
breaks up your long document intelligently, performs extraction on each segment, and then merges them back together all while preserving the original integrity in edge cases (i.e. tables spanning over page breaks).
Your schema needs to have an array item (
[]
) at the top level in order for array_extract
to work. Otherwise it will throw an error.