Configurations
OCR Options
Reducto provides several options to control how Optical Character Recognition (OCR) is performed on your documents. These options allow you to fine-tune the OCR process based on your specific needs.
OCR Mode
The ocr_mode
parameter controls how OCR is performed on your documents:
- standard (default): Uses the standard OCR process for all document content.
- agentic: Enables automatic editing of OCR results, which can improve accuracy for complex tables (merged cells, nested headers, etc) and tricky text (handwriting, small symbols).
When to Use Agentic OCR Mode?
Consider using the agentic OCR mode when:
- Accuracy is critical for your application, and you’re seeing small discrepancies in standard OCR.
- You’re willing to accept a small increase in processing time and cost (2x credits) for improved accuracy.
The agentic OCR mode uses AI to automatically edit and correct OCR results, which can significantly improve the quality of extractions.
Extraction Mode
The extraction_mode
parameter controls the method used for text extraction:
- ocr (default): Uses only OCR to extract text from the document.
- metadata: Uses only the document’s embedded text (if available).
- hybrid: Attempts to use the document’s embedded text first, then falls back to OCR if needed.
OCR System
For advanced users, the ocr_system
parameter (in the advanced options) allows you to specify which OCR system to use:
- highres: Recommended for documents with English characters.
- multilingual: Better for documents with non-English characters.
- combined: Uses a combination of OCR systems for improved results for multilingual documents at a small latency cost.