OCR systems overview
Reducto offers two OCR systems that can be specified using thesettings.ocr_system parameter:
- standard (default): Our best multilingual OCR system that handles documents with languages of all kinds.
- legacy: Only supports Germanic languages and is available for backwards compatibility.
Languages supported by OCR system
Standard OCR system
Thestandard OCR system is our recommended default and supports a comprehensive set of languages from around the world. The following languages are prioritized and regularly evaluated for quality:
Afrikaans, Albanian, Arabic, Armenian, Belarusian, Bengali, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Filipino, Finnish, French, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Khmer, Korean, Lao, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Marathi, Nepali, Norwegian, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Vietnamese, Yiddish
This system provides high-quality OCR results for documents in any of these languages, including multilingual documents that contain multiple languages.
Legacy OCR system
Thelegacy OCR system is optimized for Germanic languages only and is provided for backwards compatibility:
- English
- German
- Dutch
- Norwegian
- Swedish
- Danish
- Icelandic
- Afrikaans
standard OCR system for all new projects, as it provides better accuracy and supports all the languages above plus many more.
Choosing the right OCR system
- Use
standard(default) for all new projects and for documents in any language, including multilingual documents. - Use
legacyonly if you need backwards compatibility with previous integrations that specifically relied on the legacy system’s behavior.