Supported Languages

Reducto’s OCR systems support a wide variety of languages across different writing systems. The language support varies depending on which OCR system you choose.

OCR Systems Overview

Reducto offers different OCR systems that can be specified using the ocr_system parameter in the advanced options:

  • highres: Optimized for documents with English, Spanish, Italian, Portuguese, French, and German text.
  • multilingual: Supports a comprehensive set of languages from around the world.
  • combined: Uses a combination of OCR systems for improved results for multilingual documents at a small latency cost.

Languages Supported by OCR System

Highres OCR System

The highres OCR system is optimized for the following languages:

  • English
  • Spanish
  • Italian
  • Portuguese
  • French
  • German

This system provides high-quality OCR results for documents primarily in these languages.

Multilingual OCR System

The multilingual OCR system supports a much wider range of languages, categorized by their level of support:

Fully Supported Languages

The following languages are prioritized and regularly evaluated for quality:

Afrikaans, Albanian, Arabic, Armenian, Belarusian, Bengali, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Filipino, Finnish, French, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Khmer, Korean, Lao, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Marathi, Nepali, Norwegian, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Vietnamese, Yiddish

Choosing the Right OCR System

  • Use highres for documents primarily in English, Spanish, Italian, Portuguese, French, or German.
  • Use multilingual for documents containing languages beyond those supported by highres.
  • Use combined for multilingual documents where you need the highest possible accuracy across multiple languages, with a small latency cost.

For more information on OCR options, see the OCR Options documentation.