Skip to main content

Introduction

Reducto applies a set of default parsing configurations to every request. These defaults are designed to cover the most common document types and workflows, so you can start parsing without specifying every parameter in your API calls. By understanding the defaults, you can:
  • Avoid sending long, repetitive configuration lists with each request
  • Decide when to override a setting for your specific use case
  • Use parse output as the foundation for extraction, making it a good place to debug issues
This page lists the defaults, explains their purpose, and shows how to override them when needed.

Default configurations (Parse)

Start with defaults. Add overrides only where your workflow requires different behavior.
SettingDefaultDescription
enhance.agentic[]List of agentic modes for AI-powered enhancements. Can include text, table, and figure scopes.
enhance.summarize_figurestrueInclude AI-generated figure/image summaries.
SettingDefaultDescription
retrieval.chunking.chunk_modevariableChunking strategy for parsed text.
retrieval.chunking.chunk_sizenullThe approximate size of chunks (in characters). Defaults to variable between 250-1500.
retrieval.filter_blocks[]Block types to exclude from output.
retrieval.embedding_optimizedfalseInclude AI-generated table summaries for better embedding.
SettingDefaultDescription
formatting.add_page_markersfalseInsert page boundary markers into text.
formatting.table_output_formatdynamicTable rendering format in output. Options: html, json, md, jsonbbox, dynamic, csv.
formatting.merge_tablesfalseMerge adjacent tables on a page.
formatting.include[]List of markup to include. Options: change_tracking, highlight, comments.
SettingDefaultDescription
spreadsheet.split_large_tables.enabledtrueSplit very large tables into chunks.
spreadsheet.split_large_tables.size50Max rows per large-table chunk.
spreadsheet.clusteringaccurateSplits up tables inside spreadsheets. Options: accurate, fast, disabled.
spreadsheet.include[]List of spreadsheet features to include. Options: cell_colors, formula.
spreadsheet.exclude[]List of elements to exclude. Options: hidden_sheets, hidden_rows, hidden_cols.
SettingDefaultDescription
settings.ocr_systemstandardOCR system preset. Options: standard, legacy.
settings.force_url_resultfalseForce returning result via URL (vs inline JSON).
settings.force_file_extensionnullForce the URL to be downloaded as a specific file extension.
settings.return_ocr_datafalseReturn low-level OCR words/lines data.
settings.return_images[]Return images for specified block types. Options: figure, table.
settings.embed_pdf_metadatafalseWrite OCR text layer back into returned PDF.
settings.persist_resultsfalsePersist outputs for later retrieval.
settings.timeout900The timeout for the job in seconds.
settings.page_rangenullThe page range to process (1-indexed).
settings.document_passwordnullPassword to decrypt password-protected documents.

Defaults in action

Both of these calls are equivalent — the first sets every default explicitly, while the second relies on built-in defaults.
Explicit configuration example
result = client.parse.run(
    input=SAMPLE_URL,
    enhance={
        "agentic": [],
        "summarize_figures": True
    },
    retrieval={
        "chunking": {"chunk_mode": "variable", "chunk_size": None},
        "filter_blocks": [],
        "embedding_optimized": False
    },
    formatting={
        "add_page_markers": False,
        "table_output_format": "dynamic",
        "merge_tables": False,
        "include": []
    },
    spreadsheet={
        "split_large_tables": {"enabled": True, "size": 50},
        "clustering": "accurate",
        "include": [],
        "exclude": []
    },
    settings={
        "ocr_system": "standard",
        "force_url_result": False,
        "return_ocr_data": False,
        "return_images": [],
        "embed_pdf_metadata": False,
        "persist_results": False,
        "timeout": 900
    }
)
Default configuration example
result = client.parse.run(input=SAMPLE_URL)

When to Override Defaults

Most of the time, you can rely on the default configurations. However, use cases and document formats vary widely, here are a few examples:
  • Enable agentic processing
    For documents with complex tables or figures, use agentic modes to improve accuracy. Set enhance.agentic=[{"scope": "text"}] for text, {"scope": "table"} for tables, or {"scope": "figure"} for figures.
  • Return figures and tables
    Helpful for research papers or scientific documents where charts and illustrations are common. Set settings.return_images=["figure", "table"].
  • Different languages
    The default standard OCR system supports multilingual documents. Use settings.ocr_system="legacy" only for backwards compatibility with Germanic languages.
  • Track document changes
    For legal or compliance review, enable change tracking with formatting.include=["change_tracking"] to detect underlines and strikethroughs.
  • Process spreadsheets efficiently
    For large spreadsheets, adjust spreadsheet.split_large_tables.size to control chunk sizes, or use spreadsheet.clustering="fast" for faster processing at the cost of some accuracy.
Refer to our parsing best practices for more examples.

Next Steps

  • Learn more about all available parameters in the Parse API Reference.
  • Try different configurations interactively in the Studio Playground.
  • Continue to Extraction to see how parse output is used as the foundation for structured data.
I