Supported Languages
Languages supported by Reducto’s OCR systems
Supported Languages
Reducto’s OCR systems support a wide variety of languages across different writing systems. The language support varies depending on which OCR system you choose.
OCR Systems Overview
Reducto offers different OCR systems that can be specified using the ocr_system
parameter in the advanced options:
- highres: Optimized for documents with English, Spanish, Italian, Portuguese, French, and German text.
- multilingual: Supports a comprehensive set of languages from around the world.
- combined: Uses a combination of OCR systems for improved results for multilingual documents at a small latency cost.
Languages Supported by OCR System
Highres OCR System
The highres
OCR system is optimized for the following languages:
- English
- Spanish
- Italian
- Portuguese
- French
- German
This system provides high-quality OCR results for documents primarily in these languages.
Multilingual OCR System
The multilingual
OCR system supports a much wider range of languages, categorized by their level of support:
Fully Supported Languages
The following languages are prioritized and regularly evaluated for quality:
Afrikaans, Albanian, Arabic, Armenian, Belarusian, Bengali, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Filipino, Finnish, French, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Khmer, Korean, Lao, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Marathi, Nepali, Norwegian, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Vietnamese, Yiddish
Choosing the Right OCR System
- Use
highres
for documents primarily in English, Spanish, Italian, Portuguese, French, or German. - Use
multilingual
for documents containing languages beyond those supported byhighres
. - Use
combined
for multilingual documents where you need the highest possible accuracy across multiple languages, with a small latency cost.
For more information on OCR options, see the OCR Options documentation.