Introduction
Reducto applies a set of default parsing configurations to every request. These defaults are designed to cover the most common document types and workflows, so you can start parsing without specifying every parameter in your API calls. By understanding the defaults, you can:- Avoid sending long, repetitive configuration lists with each request
- Decide when to override a setting for your specific use case
- Use parse output as the foundation for extraction, making it a good place to debug issues
Default configurations (Parse)
Start with defaults. Add overrides only where your workflow requires different behavior.
Enhance (AI-powered enhancements)
Enhance (AI-powered enhancements)
| Setting | Default | Description |
|---|---|---|
| enhance.agentic | [] | List of agentic modes for AI-powered enhancements. Can include text, table, and figure scopes. |
| enhance.summarize_figures | true | Include AI-generated figure/image summaries. |
Retrieval (RAG-focused settings)
Retrieval (RAG-focused settings)
| Setting | Default | Description |
|---|---|---|
| retrieval.chunking.chunk_mode | variable | Chunking strategy for parsed text. |
| retrieval.chunking.chunk_size | null | The approximate size of chunks (in characters). Defaults to variable between 250-1500. |
| retrieval.filter_blocks | [] | Block types to exclude from output. |
| retrieval.embedding_optimized | false | Include AI-generated table summaries for better embedding. |
Formatting (Output format controls)
Formatting (Output format controls)
| Setting | Default | Description |
|---|---|---|
| formatting.add_page_markers | false | Insert page boundary markers into text. |
| formatting.table_output_format | dynamic | Table rendering format in output. Options: html, json, md, jsonbbox, dynamic, csv. |
| formatting.merge_tables | false | Merge adjacent tables on a page. |
| formatting.include | [] | List of markup to include. Options: change_tracking, highlight, comments. |
Spreadsheet (Spreadsheet-specific settings)
Spreadsheet (Spreadsheet-specific settings)
| Setting | Default | Description |
|---|---|---|
| spreadsheet.split_large_tables.enabled | true | Split very large tables into chunks. |
| spreadsheet.split_large_tables.size | 50 | Max rows per large-table chunk. |
| spreadsheet.clustering | accurate | Splits up tables inside spreadsheets. Options: accurate, fast, disabled. |
| spreadsheet.include | [] | List of spreadsheet features to include. Options: cell_colors, formula. |
| spreadsheet.exclude | [] | List of elements to exclude. Options: hidden_sheets, hidden_rows, hidden_cols. |
Settings (General settings)
Settings (General settings)
| Setting | Default | Description |
|---|---|---|
| settings.ocr_system | standard | OCR system preset. Options: standard, legacy. |
| settings.force_url_result | false | Force returning result via URL (vs inline JSON). |
| settings.force_file_extension | null | Force the URL to be downloaded as a specific file extension. |
| settings.return_ocr_data | false | Return low-level OCR words/lines data. |
| settings.return_images | [] | Return images for specified block types. Options: figure, table. |
| settings.embed_pdf_metadata | false | Write OCR text layer back into returned PDF. |
| settings.persist_results | false | Persist outputs for later retrieval. |
| settings.timeout | 900 | The timeout for the job in seconds. |
| settings.page_range | null | The page range to process (1-indexed). |
| settings.document_password | null | Password to decrypt password-protected documents. |
Defaults in action
Both of these calls are equivalent — the first sets every default explicitly, while the second relies on built-in defaults.Explicit configuration example
Default configuration example
When to Override Defaults
Most of the time, you can rely on the default configurations. However, use cases and document formats vary widely, here are a few examples:-
Enable agentic processing
For documents with complex tables or figures, use agentic modes to improve accuracy. Setenhance.agentic=[{"scope": "text"}]for text,{"scope": "table"}for tables, or{"scope": "figure"}for figures. -
Return figures and tables
Helpful for research papers or scientific documents where charts and illustrations are common. Setsettings.return_images=["figure", "table"]. -
Different languages
The defaultstandardOCR system supports multilingual documents. Usesettings.ocr_system="legacy"only for backwards compatibility with Germanic languages. -
Track document changes
For legal or compliance review, enable change tracking withformatting.include=["change_tracking"]to detect underlines and strikethroughs. -
Process spreadsheets efficiently
For large spreadsheets, adjustspreadsheet.split_large_tables.sizeto control chunk sizes, or usespreadsheet.clustering="fast"for faster processing at the cost of some accuracy.
Next Steps
- Learn more about all available parameters in the Parse API Reference.
- Try different configurations interactively in the Studio Playground.
- Continue to Extraction to see how parse output is used as the foundation for structured data.