Skip to main content
Reducto’s OCR systems support a wide variety of languages across different writing systems. The language support varies depending on which OCR system you choose.

OCR systems overview

Reducto offers two OCR systems that can be specified using the settings.ocr_system parameter:
  • standard (default): Our best multilingual OCR system that handles documents with languages of all kinds.
  • legacy: Only supports Germanic languages and is available for backwards compatibility.
client.parse.run(
    input=upload,
    settings={
        "ocr_system": "standard"  # or "legacy"
    }
)

Languages supported by OCR system

Standard OCR system

The standard OCR system is our recommended default and supports a comprehensive set of languages from around the world. The following languages are prioritized and regularly evaluated for quality: Afrikaans, Albanian, Arabic, Armenian, Belarusian, Bengali, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Filipino, Finnish, French, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Khmer, Korean, Lao, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Marathi, Nepali, Norwegian, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Vietnamese, Yiddish This system provides high-quality OCR results for documents in any of these languages, including multilingual documents that contain multiple languages.

Legacy OCR system

The legacy OCR system is optimized for Germanic languages only and is provided for backwards compatibility:
  • English
  • German
  • Dutch
  • Norwegian
  • Swedish
  • Danish
  • Icelandic
  • Afrikaans
Note: The legacy system is maintained for backwards compatibility only. We strongly recommend using the standard OCR system for all new projects, as it provides better accuracy and supports all the languages above plus many more.

Choosing the right OCR system

  • Use standard (default) for all new projects and for documents in any language, including multilingual documents.
  • Use legacy only if you need backwards compatibility with previous integrations that specifically relied on the legacy system’s behavior.
For more information on OCR options, see the OCR Options documentation.
I