Overview
Array extraction is a type of extraction that is great for:- Long documents with fields across many pages
- Dense/complex tables with many line items
array_extract
functionality, which can help you return ALL of the information you need without missing pages. If you see that your extraction output looks like it’s missing items, we recommend array extraction.
Experimenting with your system prompt can also help with longer documents.
array_extract
breaks up your long document intelligently, performs extraction on each segment, and then merges them back together all while preserving the original integrity in edge cases (i.e. tables spanning over page breaks).
Your schema needs to have an array item (
[]
) at the top level in order for array_extract
to work. Otherwise it will throw an error.