Array extraction is a mode specifically designed for extracting arrays (lists of items) from documents. It exists because LLMs have context limits that cause them to truncate long lists. The core problem: if you ask an LLM to extract 500 transactions from a bank statement, it might return only the first 50-100 before stopping. Array extraction solves this by splitting the document into segments, extracting the array items from each segment, then merging results. This only affects array fields in your schema. Scalar fields (strings, numbers, single objects) are still extracted from the full document context normally.Documentation Index
Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
Use this file to discover all available pages before exploring further.
When to Use
- Your schema has array fields with many items (50+)
- Extraction results look truncated or end abruptly
- Tables span multiple pages
How It Works
- Segment the document into overlapping page ranges
- Extract array items from each segment independently
- Merge all array items together
- Deduplicate items that appeared in overlapping regions
Schema Requirements
Your schema must have at least one top-level array property:Deduplication
When segments overlap, the same item may be extracted twice. Reducto deduplicates using content similarity. Problem: If your document has legitimately identical items (two transactions with the same date and amount), deduplication might incorrectly merge them. Solution: Add distinguishing fields like line numbers or IDs:With Citations
Array extraction works with citations. Each item retains its source location:Troubleshooting
Still missing items:- Check Parse output first (
client.parse.run). Extract can only find what Parse sees. - Add system prompt: “Extract every item. Do not skip any rows.”
"type": "array" property exists at the top level.