Reducto provides several options to control how Optical Character Recognition (OCR) is performed on your documents. These options allow you to fine-tune the OCR process based on your specific needs.

OCR Mode

The ocr_mode parameter controls how OCR is performed on tables in your documents:

  • standard (default): Uses the standard OCR process for all document content.
  • agentic: Enables automatic editing of table OCR results, which can improve accuracy for complex tables with some small extra latency.

Extraction Mode

The extraction_mode parameter controls the method used for text extraction:

  • ocr (default): Uses only OCR to extract text from the document.
  • metadata: Uses only the document’s embedded text (if available).
  • hybrid: Attempts to use the document’s embedded text first, then falls back to OCR if needed.

OCR System

For advanced users, the ocr_system parameter (in the advanced options) allows you to specify which OCR system to use:

  • highres: Recommended for documents with English characters.
  • multilingual: Better for documents with non-English characters.
  • combined: Uses a combination of OCR systems for improved results for multilingual documents at a small latency cost.

When to Use Agentic OCR Mode

Consider using the agentic OCR mode when:

  1. Your documents contain complex tables with merged cells, nested headers, or other complex structures
  2. Table accuracy is critical for your application
  3. You’re willing to accept a small increase in processing time and cost for improved table accuracy

The agentic OCR mode uses AI to automatically edit and correct table OCR results, which can significantly improve the quality of extracted tables.