Skip to main content
The v2 configuration format (documented in this Legacy version) is deprecated. Please migrate to the 2025-10-14 version for the latest features and improvements.

Overview

The 2025-10-14 release introduces a restructured configuration format (v3) that provides better organization and clarity. This guide will help you migrate from the Legacy (v2) configuration format to the new format.

Key Changes

1. Input Parameter

The document_url parameter has been renamed to input for clarity: Legacy (v2)
client.parse.run(document_url="https://example.com/doc.pdf")
2025-10-14 (v3)
client.parse.run(input="https://example.com/doc.pdf")

2. Configuration Structure Reorganization

The configuration options have been reorganized into more logical groupings:
  • enhance: AI-powered enhancements (agentic modes, figure summarization)
  • retrieval: RAG-focused settings (chunking, filtering, embedding optimization)
  • formatting: Output format controls (tables, page markers, markup)
  • spreadsheet: Spreadsheet-specific settings
  • settings: General settings (OCR system, timeouts, passwords)

Complete Mapping Reference

Parse Configuration

Basic Options → Multiple Categories

Legacy (v2)2025-10-14 (v3)Notes
document_urlinputRenamed for clarity
options.ocr_mode="agentic"enhance.agentic=[{"scope": "text"}]Agentic text mode
options.extraction_modeRemovedNo longer configurable
options.chunkingretrieval.chunkingMoved to retrieval category
options.table_summary.enabledretrieval.embedding_optimizedSimplified to boolean
options.figure_summary.enabled=Trueenhance.summarize_figures=TrueMoved to enhance
options.figure_summary.enhanced=Trueenhance.agentic=[{"scope": "figure"}]Now uses agentic
options.figure_summary.promptenhance.agentic=[{"scope": "figure", "prompt": "..."}]Custom prompting
options.filter_blocksretrieval.filter_blocksMoved to retrieval
options.force_url_resultsettings.force_url_resultMoved to settings

Advanced Options → Multiple Categories

Legacy (v2)2025-10-14 (v3)Notes
advanced_options.ocr_system="highres"settings.ocr_system="standard"Values changed
advanced_options.ocr_system="multilingual"settings.ocr_system="standard"Now uses standard
advanced_options.ocr_system="legacy"settings.ocr_system="legacy"Same
advanced_options.table_output_formatformatting.table_output_formatMoved to formatting
advanced_options.merge_tablesformatting.merge_tablesMoved to formatting
advanced_options.add_page_markersformatting.add_page_markersMoved to formatting
advanced_options.page_rangesettings.page_rangeMoved to settings
advanced_options.document_passwordsettings.document_passwordMoved to settings
advanced_options.read_comments=Trueformatting.include=["comments"]Now in list
advanced_options.enable_change_tracking=Trueformatting.include=["change_tracking"]Now in list
advanced_options.enable_highlight_detection=Trueformatting.include=["highlight"]Now in list
advanced_options.persist_resultssettings.persist_resultsMoved to settings
advanced_options.return_ocr_datasettings.return_ocr_dataMoved to settings
advanced_options.large_table_chunkingspreadsheet.split_large_tablesMoved to spreadsheet
advanced_options.spreadsheet_table_clustering="default"spreadsheet.clustering="fast"Values changed
advanced_options.spreadsheet_table_clustering="intelligent"spreadsheet.clustering="accurate"Values changed
advanced_options.spreadsheet_table_clustering="disabled"spreadsheet.clustering="disabled"Same
advanced_options.include_formula_information=Truespreadsheet.include=["formula"]Now in list
advanced_options.include_color_information=Truespreadsheet.include=["cell_colors"]Now in list
advanced_options.exclude_hidden_sheets=Truespreadsheet.exclude=["hidden_sheets"]Now in list
advanced_options.exclude_hidden_rows_cols=Truespreadsheet.exclude=["hidden_rows", "hidden_cols"]Now in list
advanced_options.force_file_extensionsettings.force_file_extensionMoved to settings

Experimental Options → Multiple Categories

Legacy (v2)2025-10-14 (v3)Notes
experimental_options.enrich.enabled=True, mode="table"enhance.agentic=[{"scope": "table"}]Now uses agentic
experimental_options.enrich.promptenhance.agentic=[{"scope": "table", "prompt": "..."}]Custom prompting
experimental_options.return_figure_images=Truesettings.return_images=["figure"]Now in list
experimental_options.return_table_images=Truesettings.return_images=["table"]Now in list
experimental_options.embed_text_metadata_pdfsettings.embed_pdf_metadataRenamed
experimental_options.timeoutsettings.timeoutMoved to settings

Extract Configuration

Legacy (v2)2025-10-14 (v3)Notes
document_urlinputRenamed
schemainstructions.schemaNested in instructions
system_promptinstructions.system_promptNested in instructions
parse_configparsingRenamed (uses ParseOptions)
include_imagessettings.include_imagesMoved to settings
generate_citationssettings.citations.enabledNested in citations
array_extractsettings.array_extractMoved to settings
options.numerical_confidencesettings.citations.numerical_confidenceNested in citations
latency_sensitivesettings.optimize_for_latencyRenamed

Extract Response Format

The extract response format has changed significantly: Legacy (v2)
{
  "result": [{"field1": "value1", "field2": "value2"}],
  "citations": [{"field1": [...], "field2": [...]}]
}
2025-10-14 (v3)
{
  "result": {
    "field1": {
      "value": "value1",
      "citations": [...]
    },
    "field2": {
      "value": "value2",
      "citations": [...]
    }
  }
}

Migration Examples

Example 1: Basic Parse with Agentic OCR

Legacy (v2)
result = client.parse.run(
    document_url=upload,
    options={
        "ocr_mode": "agentic"
    }
)
2025-10-14 (v3)
result = client.parse.run(
    input=upload,
    enhance={
        "agentic": [{"scope": "text"}]
    }
)

Example 2: Parse with Multiple Configurations

Legacy (v2)
result = client.parse.run(
    document_url=upload,
    options={
        "ocr_mode": "agentic",
        "chunking": {"chunk_mode": "variable"},
        "table_summary": {"enabled": True},
        "figure_summary": {"enabled": True, "enhanced": True}
    },
    advanced_options={
        "ocr_system": "multilingual",
        "table_output_format": "html",
        "page_range": {"start": 1, "end": 10},
        "enable_change_tracking": True
    },
    experimental_options={
        "enrich": {"enabled": True, "mode": "table"}
    }
)
2025-10-14 (v3)
result = client.parse.run(
    input=upload,
    enhance={
        "agentic": [
            {"scope": "text"},
            {"scope": "figure"},
            {"scope": "table"}
        ],
        "summarize_figures": True
    },
    retrieval={
        "chunking": {"chunk_mode": "variable"},
        "embedding_optimized": True
    },
    formatting={
        "table_output_format": "html",
        "include": ["change_tracking"]
    },
    settings={
        "ocr_system": "standard",
        "page_range": {"start": 1, "end": 10}
    }
)

Example 3: Extract Configuration

Legacy (v2)
result = client.extract.run(
    document_url=upload,
    schema=my_schema,
    system_prompt="Be precise and thorough.",
    generate_citations=True,
    array_extract=True,
    options={"numerical_confidence": True}
)
2025-10-14 (v3)
result = client.extract.run(
    input=upload,
    instructions={
        "schema": my_schema,
        "system_prompt": "Be precise and thorough."
    },
    settings={
        "array_extract": True,
        "citations": {
            "enabled": True,
            "numerical_confidence": True
        }
    }
)

Example 4: Spreadsheet Processing

Legacy (v2)
result = client.parse.run(
    document_url=upload,
    advanced_options={
        "large_table_chunking": {"enabled": True, "size": 100},
        "spreadsheet_table_clustering": "intelligent",
        "include_formula_information": True,
        "include_color_information": True,
        "exclude_hidden_sheets": True
    }
)
2025-10-14 (v3)
result = client.parse.run(
    input=upload,
    spreadsheet={
        "split_large_tables": {"enabled": True, "size": 100},
        "clustering": "accurate",
        "include": ["formula", "cell_colors"],
        "exclude": ["hidden_sheets"]
    }
)

Async Configuration

The async configuration structure remains similar but uses the async parameter: Legacy (v2)
from reducto.models import WebhookConfig

result = client.parse.run_async(
    document_url=upload,
    async_config={
        "webhook": WebhookConfig(url="https://example.com/webhook"),
        "priority": True
    }
)
2025-10-14 (v3)
result = client.parse.run_async(
    input=upload,
    async={
        "webhook": {"mode": "direct", "url": "https://example.com/webhook"},
        "priority": True
    }
)

Breaking Changes Checklist

When migrating your code, make sure to:
  • Replace all document_url with input
  • Move ocr_mode="agentic" to enhance.agentic=[{"scope": "text"}]
  • Update ocr_system values (highres/multilingual → standard)
  • Replace table_summary.enabled with retrieval.embedding_optimized
  • Move figure/table enhancements to enhance.agentic
  • Convert boolean flags to list entries where applicable (e.g., enable_change_trackingformatting.include=["change_tracking"])
  • Update spreadsheet clustering values (default → fast, intelligent → accurate)
  • Restructure extract response handling to use nested value/citations format
  • Move extract schema and system_prompt into instructions object
  • Update citation handling in extract to use the new nested format

Need Help?

If you encounter issues during migration:
  1. Check the API Reference for the 2025-10-14 version
  2. Review the configuration examples in the new version
  3. Contact support at support@reducto.ai
I