Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt

Use this file to discover all available pages before exploring further.

Parse options affect accuracy, cost, and speed. The following scenarios highlight when to use specific configurations.

Handling handwriting or small text

  • Use Agentic OCR.
  • Best for: forms, signatures, handwritten notes.
  • Tradeoff: slightly higher cost and latency in exchange for higher accuracy.

Working with non-Germanic languages

  • Use the Multilingual OCR system.
  • Needed for: languages outside English, Spanish, Italian, Portuguese, French, or German.
  • Also improves parsing of Unicode and special characters.

Unexpected symbols in your output

  • Cause: metadata embeddings in PDFs may contain corrupted or hidden text.
  • Fix: switch to OCR extraction mode.
  • Avoid: Hybrid or Metadata extraction modes unless you are sure the metadata is reliable.

Missing checkboxes or images

  • Enable experimental options in parse configs:
    • enable_checkboxes: Detects and returns checkboxes with True/False.
    • return_figure_images: Detects and returns figures in the document.
    • return_table_images: Detects and returns tables in the document.
URLs from return_figure_images and return_table_images expire in ~24 hours.
Download immediately if you need permanent storage.

Complex tables causing problems?

  • Option 1: Enrich with table mode
    • Enable enrich_mode with table.
    • Add prompts to guide how rows and columns should align.
  • Option 2: AI JSON format
Only three configurations change outputs for spreadsheets:
  • include_color_information: adds Excel cell color details with LaTeX to the parse output.
  • spreadsheet_table_clustering: splits up individual tables inside of multi-table spreadsheets.
  • large_table_chunking: splits very large tables into smaller, manageable chunks for downstream processing.