Best practices for parsing - Reducto API

Handling handwriting or small text
Working with non-Germanic languages
Unexpected symbols in your output
Missing checkboxes or images
Complex tables causing problems?
Spreadsheet related configurations

Parse options affect accuracy, cost, and speed. The following scenarios highlight when to use specific configurations.

Handling handwriting or small text

Use Agentic OCR.
Best for: forms, signatures, handwritten notes.
Tradeoff: slightly higher cost and latency in exchange for higher accuracy.

Working with non-Germanic languages

Use the Multilingual OCR system.
Needed for: languages outside English, Spanish, Italian, Portuguese, French, or German.
Also improves parsing of Unicode and special characters.

Unexpected symbols in your output

Cause: metadata embeddings in PDFs may contain corrupted or hidden text.
Fix: switch to OCR extraction mode.
Avoid: Hybrid or Metadata extraction modes unless you are sure the metadata is reliable.

Missing checkboxes or images

Enable experimental options in parse configs:
- enable_checkboxes: Detects and returns checkboxes with True/False.
- return_figure_images: Detects and returns figures in the document.
- return_table_images: Detects and returns tables in the document.

URLs from return_figure_images and return_table_images expire in ~24 hours.
Download immediately if you need permanent storage.

Complex tables causing problems?

Option 1: Enrich with table mode
- Enable enrich_mode with table.
- Add prompts to guide how rows and columns should align.
Option 2: AI JSON format
- Use Table Output Format → ai_json.
- Passes the table image to a model for structural analysis.
- Tradeoff: higher latency, sometimes higher accuracy.

Only three configurations change outputs for spreadsheets:

include_color_information: adds Excel cell color details with LaTeX to the parse output.
spreadsheet_table_clustering: splits up individual tables inside of multi-table spreadsheets.
large_table_chunking: splits very large tables into smaller, manageable chunks for downstream processing.

Response format Parsing large files

⌘I