Skip to main content

Introduction

Reducto applies a set of default parsing configurations to every request. These defaults are designed to cover the most common document types and workflows, so you can start parsing without specifying every parameter in your API calls. By understanding the defaults, you can:
  • Avoid sending long, repetitive configuration lists with each request
  • Decide when to override a setting for your specific use case
  • Use parse output as the foundation for extraction, making it a good place to debug issues
This page lists the defaults, explains their purpose, and shows how to override them when needed.

Default configurations (Parse)

Start with defaults. Add overrides only where your workflow requires different behavior.
SettingDefaultDescription
options.ocr_modestandardWhether or not to use Agentic OCR.
options.extraction_modeocrExtraction source for text content.
options.chunking.chunk_modevariableChunking strategy for parsed text.
options.table_summary.enabledfalseInclude AI-generated table summaries.
options.figure_summary.enabledfalseInclude AI-generated figure/image summaries.
options.filter_blocks[]Block types to exclude from output.
options.force_url_resultfalseForce returning result via URL (vs inline JSON).
SettingDefaultDescription
advanced_options.ocr_systemhighresOCR system preset.
advanced_options.table_output_formathtmlTable rendering format in output.
advanced_options.merge_tablesfalseMerge adjacent tables on a page.
advanced_options.include_formula_informationfalseInclude spreadsheet formula details.
advanced_options.continue_hierarchytruePreserve document hierarchy across chunks.
advanced_options.large_table_chunking.enabledtrueSplit very large tables into chunks.
advanced_options.large_table_chunking.size50Max rows per large-table chunk.
advanced_options.spreadsheet_table_clusteringdefaultSplits up tables inside tables with multiple.
advanced_options.add_page_markersfalseInsert page boundary markers into text.
advanced_options.remove_text_formattingfalseStrip bold/italics and text styling.
advanced_options.return_ocr_datafalseReturn low-level OCR words/lines data.
advanced_options.filter_line_numbersfalseRemove leading line numbers.
advanced_options.read_commentsfalseParses comments from the PDF.
advanced_options.persist_resultsfalsePersist outputs for later retrieval.
advanced_options.exclude_hidden_sheetsfalseSkip hidden sheets in spreadsheets.
advanced_options.exclude_hidden_rows_colsfalseSkip hidden rows/columns in spreadsheets.
advanced_options.enable_change_trackingfalseDetects strikethrough and underlines, and adds scripts.
SettingDefaultDescription
experimental_options.enrich.enabledfalseEnable post-parse enrichment pass.
experimental_options.enrich.modestandardBlock types to be enriched.
experimental_options.native_office_conversionfalseUse Windows VM instead of LibreOffice to convert files.
experimental_options.enable_checkboxesfalseDetect and return checkbox fields.
experimental_options.enable_equationsfalseDetect and return math equations.
experimental_options.rotate_pagestrueAuto-rotate misoriented pages.
experimental_options.rotate_figuresfalseAuto-rotate figures/images. This is separate from page rotation.
experimental_options.enable_scriptsfalseDetect and return subscripts and superscripts.
experimental_options.return_figure_imagesfalseReturn figure images.
experimental_options.return_table_imagesfalseReturn table images.
experimental_options.layout_modeldefaultLayout analysis model. Beta is newer.
experimental_options.embed_text_metadata_pdffalseWrite OCR text layer back into returned PDF.
experimental_options.danger_filter_wide_boxesfalseDo not use. Filter overly wide bounding boxes.

Defaults in action

Both of these calls are equivalent — the first sets every default explicitly, while the second relies on built-in defaults.
Explicit configuration example
result = client.parse.run(
    document_url=SAMPLE_URL,
    options={
        "ocr_mode": "standard",
        "extraction_mode": "ocr",
        "table_summary": {"enabled": False},
        "figure_summary": {"enabled": False},
    },
    advanced_options={
        "ocr_system": "highres",
        "table_output_format": "html",
        "continue_hierarchy": True,
        "large_table_chunking": {"enabled": True, "size": 50},
    },
    experimental_options={
        "rotate_pages": True,
        "layout_model": "default",
    },
    priority=True
)
Default configuration example
result = client.parse.run(document_url=SAMPLE_URL)

When to Override Defaults

Most of the time, you can rely on the default configurations. However, use cases and document formats vary widely, here are a few examples:
  • Disable page auto-rotation
    If you don’t expect scans or skewed content, you can disable rotate_pages for a reduction in latency.
  • Return figures and tables
    Helpful for research papers or scientific documents where charts and illustrations are common. Enable return_figure_images and return_table_images.
  • Different languages
    Use multilingual mode for ocr_system for documents that have non-Germanic languages.
Refer to our parsing best practices for more examples.

Next Steps

  • Learn more about all available parameters in the Parse API Reference.
  • Try different configurations interactively in the Studio Playground.
  • Continue to Extraction to see how parse output is used as the foundation for structured data.