When to Use
- Your schema has array fields with many items (50+)
- Extraction results look truncated or end abruptly
- Tables span multiple pages
How It Works
- Segment the document into overlapping page ranges
- Extract array items from each segment independently
- Merge all array items together
- Deduplicate items that appeared in overlapping regions
Schema Requirements
Your schema must have at least one top-level array property:Deduplication
When segments overlap, the same item may be extracted twice. Reducto deduplicates using content similarity. Problem: If your document has legitimately identical items (two transactions with the same date and amount), deduplication might incorrectly merge them. Solution: Add distinguishing fields like line numbers or IDs:With Citations
Array extraction works with citations. Each item retains its source location:Troubleshooting
Still missing items:- Check Parse output first (
client.parse.run). Extract can only find what Parse sees. - Add system prompt: “Extract every item. Do not skip any rows.”
- For critical data, use Agent-in-the-Loop which iteratively verifies completeness.
"type": "array" property exists at the top level.