Reducto provides several options to control how Optical Character Recognition (OCR) is performed on your documents. These options allow you to fine-tune the OCR process based on your specific needs.

OCR Mode

The ocr_mode parameter controls how OCR is performed on your documents:

  • standard (default): Uses the standard OCR process for all document content.
  • agentic: Enables automatic editing of OCR results, which can improve accuracy for complex tables (merged cells, nested headers, etc) and tricky text (handwriting, small symbols).

When to Use Agentic OCR Mode?

Consider using the agentic OCR mode when:

  1. Accuracy is critical for your application, and you’re seeing small discrepancies in standard OCR.
  2. You’re willing to accept a small increase in processing time and cost (2x credits) for improved accuracy.

The agentic OCR mode uses AI to automatically edit and correct OCR results, which can significantly improve the quality of extractions.

Extraction Mode

The extraction_mode parameter controls the method used for text extraction:

  • ocr (default): Uses only OCR to extract text from the document.
  • metadata: Uses only the document’s embedded text (if available).
  • hybrid: Attempts to use the document’s embedded text first, then falls back to OCR if needed.

OCR System

For advanced users, the ocr_system parameter (in the advanced options) allows you to specify which OCR system to use:

  • highres: Recommended for documents with English characters.
  • multilingual: Better for documents with non-English characters.
  • combined: Uses a combination of OCR systems for improved results for multilingual documents at a small latency cost.