Skip to main content
This feature is still in beta, config options and behavior are subject to change.

Overview

Agent-in-the-loop extraction is an advanced feature that uses AI agents to intelligently review and refine extraction results. This feature is specifically designed for schemas with arrays, where you need to extract all line items such as:
  • Transaction lists from financial statements
  • Holdings or portfolio items from investment reports
  • Invoice line items from billing documents
  • Product listings from catalogs or inventory reports
The agent-in-the-loop system is particularly valuable when you cannot afford to miss any items in these arrays. It works by having an AI agent methodically review each page’s extracted data against the original document, identify missing or incorrect items, and make corrections iteratively until all line items are accurately captured.
During the current beta period, agent-in-the-loop extraction is billed at the same rate as standard extraction. For detailed information on credit usage rates, please refer to our Credit Usage and Rates documentation.

Requirements and Limitations

Requirements

  • PDF Documents: Currently only supports PDF input files
  • Array Schema: Your schema must contain at least one top-level array field
  • Full Document Processing: Processes the entire document (page range restrictions not supported)

Limitations

  • Processing Time: Significantly longer processing time due to iterative refinement
  • Cost: Higher costs due to multiple AI model calls per page
  • File Format: Limited to PDF documents only
  • Single Array Focus: Focuses refinement on one primary top-level array field at a time
  • Citations: Citations are not currently supported with agent-in-the-loop extract

Best Practices

Schema Design

Ensure your schema has a clear top-level array that represents the items you want the agent to focus on:
{
    "type": "object",
    "properties": {
        "transactions": {  // This will be the focus of agent refinement, must set line_item_name to transactions
            "type": "array",
            "items": {
                "type": "object",
                "properties": {
                    "date": {"type": "string"},
                    "amount": {"type": "number"},
                    "description": {"type": "string"}
                }
            }
        },
        "document_summary": {  // This won't be refined by the agent
            "type": "string"
        }
    }
}

System Prompts

Provide clear, specific system prompts that help the agent understand:
  • What constitutes a valid item vs. summary/header rows
  • How to handle edge cases in your document type
  • The level of precision required
  • Details about how to identify the the items
"system_prompt": "Extract individual transaction line items only. Exclude summary rows, headers, and totals. Each transaction must have a unique date, amount, and description. Transactions are in tables following a 'Transactions' header"

Fields to Verify

The fields_to_verify must match the name of the key of the top-level array in the schema. Currently only one field can be checked, so the length of fields_to_verify must be equal to 1:
"fields_to_verify": ["transactions"]  # Clear and specific, matches the schema above
# vs
"fields_to_verify": ["items", "document_summary"]  # Does not match the key of the array, includes non array field names

Configuration

Basic Configuration

To enable agent-in-the-loop extraction, set the enabled flag to true in your extraction request:
import requests

headers = {"Authorization": f"Bearer {REDUCTO_API_KEY}"}

schema = {
    "type": "object",
    "properties": {
        "invoice_line_items": {
            "type": "array",
            "description": "List of charges in an invoice table.",
            "items": {
                "type": "object",
                "properties": {
                    "item_name": {"type": "string"},
                    "item_cost": {"type": "number"},
                    "quantity": {"type": "number"}
                }
            }
        }
    }
}

extract_response = requests.post(
    "https://platform.reducto.ai/extract",
    json={
        "input": "YOUR_DOCUMENT_URL",
        "instructions": {
            "schema": schema,
            "system_prompt": "Be precise and thorough when extracting invoice line items.",
            "agent_in_the_loop": {
                "enabled": True,
                "fields_to_verify": ["invoice_line_items"]
            }
        }
    },
    headers=headers,
)

Configuration Parameters

agent_in_the_loop

ParameterTypeDefaultDescription
enabledbooleanfalseEnables agent-in-the-loop extraction
fields_to_verifylist[str][]List of the fields that the agent will focus on for refinement. This currently only supports a single array field. The name of the array field must match the key of the top-level array field within the schema

Example Use Cases

Invoice Processing

{
    "instructions": {
        "schema": {
            "type": "object",
            "properties": {
                "line_items": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "description": {"type": "string"},
                            "quantity": {"type": "number"},
                            "unit_price": {"type": "number"},
                            "total": {"type": "number"}
                        }
                    }
                }
            }
        },
        "system_prompt": "Extract each invoice line item precisely. Exclude tax lines, subtotals, and summary rows.",
        "agent_in_the_loop": {
            "enabled": True,
            "fields_to_verify": ["line_items"]
        }
    }
}

Financial Statements

{
    "instructions": {
        "schema": {
            "type": "object",
            "properties": {
                "transactions": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "date": {"type": "string"},
                            "account": {"type": "string"},
                            "debit": {"type": "number"},
                            "credit": {"type": "number"}
                        }
                    }
                }
            }
        },
        "system_prompt": "Extract individual journal entries. Each entry must have a date and either a debit or credit amount.",
        "agent_in_the_loop": {
            "enabled": True,
            "fields_to_verify": ["transactions"]
        }
    }
}