Understand which settings run automatically and when to override them
Basic options
Setting | Default | Description |
---|---|---|
options.ocr_mode | standard | Whether or not to use Agentic OCR. |
options.extraction_mode | ocr | Extraction source for text content. |
options.chunking.chunk_mode | variable | Chunking strategy for parsed text. |
options.table_summary.enabled | false | Include AI-generated table summaries. |
options.figure_summary.enabled | false | Include AI-generated figure/image summaries. |
options.filter_blocks | [] | Block types to exclude from output. |
options.force_url_result | false | Force returning result via URL (vs inline JSON). |
Advanced options
Setting | Default | Description |
---|---|---|
advanced_options.ocr_system | highres | OCR system preset. |
advanced_options.table_output_format | html | Table rendering format in output. |
advanced_options.merge_tables | false | Merge adjacent tables on a page. |
advanced_options.include_formula_information | false | Include spreadsheet formula details. |
advanced_options.continue_hierarchy | true | Preserve document hierarchy across chunks. |
advanced_options.large_table_chunking.enabled | true | Split very large tables into chunks. |
advanced_options.large_table_chunking.size | 50 | Max rows per large-table chunk. |
advanced_options.spreadsheet_table_clustering | default | Splits up tables inside tables with multiple. |
advanced_options.add_page_markers | false | Insert page boundary markers into text. |
advanced_options.remove_text_formatting | false | Strip bold/italics and text styling. |
advanced_options.return_ocr_data | false | Return low-level OCR words/lines data. |
advanced_options.filter_line_numbers | false | Remove leading line numbers. |
advanced_options.read_comments | false | Parses comments from the PDF. |
advanced_options.persist_results | false | Persist outputs for later retrieval. |
advanced_options.exclude_hidden_sheets | false | Skip hidden sheets in spreadsheets. |
advanced_options.exclude_hidden_rows_cols | false | Skip hidden rows/columns in spreadsheets. |
advanced_options.enable_change_tracking | false | Detects strikethrough and underlines, and adds scripts. |
Experimental options
Setting | Default | Description |
---|---|---|
experimental_options.enrich.enabled | false | Enable post-parse enrichment pass. |
experimental_options.enrich.mode | standard | Block types to be enriched. |
experimental_options.native_office_conversion | false | Use Windows VM instead of LibreOffice to convert files. |
experimental_options.enable_checkboxes | false | Detect and return checkbox fields. |
experimental_options.enable_equations | false | Detect and return math equations. |
experimental_options.rotate_pages | true | Auto-rotate misoriented pages. |
experimental_options.rotate_figures | false | Auto-rotate figures/images. This is separate from page rotation. |
experimental_options.enable_scripts | false | Detect and return subscripts and superscripts. |
experimental_options.return_figure_images | false | Return figure images. |
experimental_options.return_table_images | false | Return table images. |
experimental_options.layout_model | default | Layout analysis model. Beta is newer. |
experimental_options.embed_text_metadata_pdf | false | Write OCR text layer back into returned PDF. |
experimental_options.danger_filter_wide_boxes | false | Do not use. Filter overly wide bounding boxes. |
rotate_pages
for a reduction in latency.
return_figure_images
and return_table_images
.
multilingual
mode for ocr_system
for documents that have non-Germanic languages.