> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Processing Settings

> Control OCR, timeouts, output options, and document handling

The `settings` config group controls how documents are processed: which OCR system to use, how long to wait, what to include in the response, and how to handle special cases like password-protected files.

<CodeGroup>
  ```python Python theme={null}
  result = client.parse.run(
      input=upload.file_id,
      settings={
          "ocr_system": "standard",
          "timeout": 300,
          "page_range": {"start": 1, "end": 50}
      }
  )
  ```

  ```javascript Node.js theme={null}
  const result = await client.parse.run({
    input: upload.file_id,
    settings: {
      ocr_system: 'standard',
      timeout: 300,
      page_range: { start: 1, end: 50 }
    }
  });
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/parse \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "reducto://your-file-id",
      "settings": {
        "ocr_system": "standard",
        "timeout": 300,
        "page_range": {"start": 1, "end": 50}
      }
    }'
  ```
</CodeGroup>

## OCR System

Reducto offers two OCR systems that determine how text is extracted from images and scanned documents.

<CodeGroup>
  ```python Python theme={null}
  settings={"ocr_system": "standard"}
  ```

  ```javascript Node.js theme={null}
  settings: { ocr_system: 'standard' }
  ```

  ```bash cURL theme={null}
  "settings": {"ocr_system": "standard"}
  ```
</CodeGroup>

**`standard` (default):** Our primary OCR engine supporting 60+ languages. Handles mixed-language documents automatically.

<Accordion title="Supported languages (standard OCR)">
  Afrikaans, Albanian, Arabic, Armenian, Belarusian, Bengali, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Filipino, Finnish, French, German, Greek, Gujarati, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Khmer, Korean, Lao, Latvian, Lithuanian, Macedonian, Malay, Malayalam, Marathi, Nepali, Norwegian, Persian, Polish, Portuguese, Punjabi, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swedish, Tagalog, Tamil, Telugu, Thai, Turkish, Ukrainian, Vietnamese, Yiddish
</Accordion>

**`legacy`:** An older engine optimized for Germanic languages only. Available for backwards compatibility with existing integrations. Use `standard` for new projects.

<Accordion title="Supported languages (legacy OCR)">
  English, German, Dutch, Norwegian, Swedish, Danish, Icelandic, Afrikaans
</Accordion>

<Note>
  The Go SDK uses different OCR system values (`highres`, `multilingual`, `combined`). Use Python, Node.js, or cURL for the `standard` and `legacy` options.
</Note>

For maximum accuracy on difficult documents (handwriting, faded text, poor scans), combine with agentic text mode:

<CodeGroup>
  ```python Python theme={null}
  result = client.parse.run(
      input=upload.file_id,
      settings={"ocr_system": "standard"},
      enhance={"agentic": [{"scope": "text"}]}
  )
  ```

  ```javascript Node.js theme={null}
  const result = await client.parse.run({
    input: upload.file_id,
    settings: { ocr_system: 'standard' },
    enhance: { agentic: [{ scope: 'text' }] }
  });
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/parse \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "reducto://your-file-id",
      "settings": {"ocr_system": "standard"},
      "enhance": {"agentic": [{"scope": "text"}]}
    }'
  ```
</CodeGroup>

See [Agentic Modes](/configs/parse/agentic-modes) for details on when to enable this.

## Extraction Mode

Controls how text is extracted from PDFs that have embedded text layers.

<CodeGroup>
  ```python Python theme={null}
  settings={"extraction_mode": "hybrid"}
  ```

  ```javascript Node.js theme={null}
  settings: { extraction_mode: 'hybrid' }
  ```

  ```bash cURL theme={null}
  "settings": {"extraction_mode": "hybrid"}
  ```
</CodeGroup>

**`hybrid` (default):** Uses good quality metadata first, then OCR. Best when processing mixed document sets where some have reliable embedded text and others don't.

**`ocr`:** Uses optical character recognition only, ignoring any embedded text in the PDF. Best for scanned documents, images, or when embedded text is unreliable or corrupted.

**`metadata`:** Uses embedded text from PDF metadata only, without OCR. Best for native DOCX/PDFs with reliable text layers where you want faster processing.

## Page Range

Process only specific pages to save time and credits. See [Page Ranges](/configs/parse/page-ranges) for complete documentation.

<CodeGroup>
  ```python Python theme={null}
  # Pages 1-10 only
  settings={"page_range": {"start": 1, "end": 10}}

  # Multiple ranges
  settings={"page_range": [{"start": 1, "end": 5}, {"start": 20, "end": 25}]}
  ```

  ```javascript Node.js theme={null}
  // Pages 1-10 only
  settings: { page_range: { start: 1, end: 10 } }

  // Multiple ranges
  settings: { page_range: [{ start: 1, end: 5 }, { start: 20, end: 25 }] }
  ```

  ```go Go theme={null}
  // Pages 1-10 only
  AdvancedOptions: reducto.F(shared.AdvancedProcessingOptionsParam{
      PageRange: reducto.F[shared.AdvancedProcessingOptionsPageRangeUnionParam](
          shared.PageRangeParam{
              Start: reducto.F(int64(1)),
              End:   reducto.F(int64(10)),
          },
      ),
  })
  ```

  ```bash cURL theme={null}
  # Pages 1-10 only
  "settings": {"page_range": {"start": 1, "end": 10}}

  # Multiple ranges
  "settings": {"page_range": [{"start": 1, "end": 5}, {"start": 20, "end": 25}]}
  ```
</CodeGroup>

## Timeout

Set a maximum processing time in seconds. If processing exceeds this limit, the request fails rather than hanging indefinitely.

<CodeGroup>
  ```python Python theme={null}
  settings={"timeout": 300}  # 5 minutes
  ```

  ```javascript Node.js theme={null}
  settings: { timeout: 300 }  // 5 minutes
  ```

  ```bash cURL theme={null}
  "settings": {"timeout": 300}
  ```
</CodeGroup>

If not specified, Reducto uses internal defaults appropriate for the document size.

## Password-Protected Documents

For encrypted PDFs that require a password to open:

<CodeGroup>
  ```python Python theme={null}
  settings={"document_password": "secret123"}
  ```

  ```javascript Node.js theme={null}
  settings: { document_password: 'secret123' }
  ```

  ```go Go theme={null}
  AdvancedOptions: reducto.F(shared.AdvancedProcessingOptionsParam{
      DocumentPassword: reducto.F("secret123"),
  })
  ```

  ```bash cURL theme={null}
  "settings": {"document_password": "secret123"}
  ```
</CodeGroup>

The password is used to decrypt the document before processing. It's transmitted securely but not stored.

## Return Images

By default, blocks contain only extracted text. Enable `return_images` to get pre-signed URLs pointing to cropped images of specific block types:

<CodeGroup>
  ```python Python theme={null}
  settings={"return_images": ["figure", "table"]}
  ```

  ```javascript Node.js theme={null}
  settings: { return_images: ['figure', 'table'] }
  ```

  ```bash cURL theme={null}
  "settings": {"return_images": ["figure", "table"]}
  ```
</CodeGroup>

When enabled, applicable blocks include an `image_url` field:

```json theme={null}
{
  "type": "Figure",
  "bbox": {"left": 0.1, "top": 0.2, "width": 0.8, "height": 0.4, "page": 1},
  "content": "Bar chart showing quarterly revenue growth from Q1 to Q4...",
  "image_url": "https://storage.reducto.ai/figures/abc123.png?X-Amz-Expires=3600..."
}
```

The URL is a pre-signed S3 link valid for a limited time. Download or process the image before expiration.

**Options:**

* `figure`: Cropped images for figure blocks (charts, diagrams, photos, illustrations)
* `table`: Cropped images for table blocks

## Return OCR Data

Returns the raw OCR output with word-level and line-level bounding boxes. This gives you access to the underlying text extraction before Reducto's layout analysis.

<CodeGroup>
  ```python Python theme={null}
  settings={"return_ocr_data": True}
  ```

  ```javascript Node.js theme={null}
  settings: { return_ocr_data: true }
  ```

  ```go Go theme={null}
  AdvancedOptions: reducto.F(shared.AdvancedProcessingOptionsParam{
      ReturnOcrData: reducto.F(true),
  })
  ```

  ```bash cURL theme={null}
  "settings": {"return_ocr_data": true}
  ```
</CodeGroup>

The response `result` object includes an `ocr` field containing `words` and `lines` arrays:

```json theme={null}
{
  "job_id": "parse_abc123xyz",
  "result": {
    "type": "full",
    "chunks": [...],
    "ocr": {
      "words": [
        {
          "text": "Revenue",
          "bbox": {"left": 0.12, "top": 0.08, "width": 0.15, "height": 0.02, "page": 1},
          "confidence": 0.98,
          "rotation": 0
        }
      ],
      "lines": [
        {
          "text": "Revenue Report Q4 2024",
          "bbox": {"left": 0.12, "top": 0.08, "width": 0.45, "height": 0.02, "page": 1},
          "confidence": 0.97,
          "rotation": 0
        }
      ]
    }
  }
}
```

Each word and line includes:

* `text`: The recognized text
* `bbox`: Normalized bounding box (coordinates as fractions of page dimensions)
* `confidence`: OCR confidence score between 0 and 1
* `rotation`: Detected rotation angle in degrees (0-360, counterclockwise)

## Persist Results

By default, processed results are stored temporarily and eventually deleted. Enable persistence to keep results indefinitely in long-term storage:

<CodeGroup>
  ```python Python theme={null}
  settings={"persist_results": True}
  ```

  ```javascript Node.js theme={null}
  settings: { persist_results: true }
  ```

  ```bash cURL theme={null}
  "settings": {"persist_results": true}
  ```
</CodeGroup>

When enabled, you can retrieve results later using the job ID without reprocessing the document. The response includes a `job_id` that serves as the retrieval key:

```json theme={null}
{
  "job_id": "parse_abc123xyz",
  "duration": 2.34,
  "result": {...}
}
```

Retrieve stored results later:

<CodeGroup>
  ```python Python theme={null}
  job = client.job.get("parse_abc123xyz")
  result = job.result
  ```

  ```javascript Node.js theme={null}
  const job = await client.job.retrieve('parse_abc123xyz');
  const result = job.result;
  ```

  ```bash cURL theme={null}
  curl https://platform.reducto.ai/job/parse_abc123xyz \
    -H "Authorization: Bearer $REDUCTO_API_KEY"
  ```
</CodeGroup>

<Note>Requires opting in to Reducto Studio. Contact support to enable this feature for your organization.</Note>

## Embed PDF Metadata

Embeds the OCR-extracted text back into the PDF as a hidden text layer. The response includes a `pdf_url` pointing to the enhanced PDF:

<CodeGroup>
  ```python Python theme={null}
  settings={"embed_pdf_metadata": True}
  ```

  ```javascript Node.js theme={null}
  settings: { embed_pdf_metadata: true }
  ```

  ```bash cURL theme={null}
  "settings": {"embed_pdf_metadata": true}
  ```
</CodeGroup>

```json theme={null}
{
  "job_id": "parse_abc123xyz",
  "pdf_url": "https://storage.reducto.ai/pdfs/abc123.pdf?...",
  "result": {...}
}
```

The returned PDF looks identical to the original but now supports:

* Text selection and copy/paste in PDF viewers
* Full-text search within the document
* Accessibility features (screen readers can read the text)

## Force URL Result

By default, Reducto returns the full result inline in the response. For very large documents, this is automatically switched to a URL. You can force URL mode regardless of size:

<CodeGroup>
  ```python Python theme={null}
  settings={"force_url_result": True}
  ```

  ```javascript Node.js theme={null}
  settings: { force_url_result: true }
  ```

  ```go Go theme={null}
  Options: reducto.F(shared.BaseProcessingOptionsParam{
      ForceURLResult: reducto.F(true),
  })
  ```

  ```bash cURL theme={null}
  "settings": {"force_url_result": true}
  ```
</CodeGroup>

When enabled, the response contains a URL instead of inline content:

```json theme={null}
{
  "job_id": "parse_abc123xyz",
  "result": {
    "type": "url",
    "url": "https://storage.reducto.ai/results/abc123.json?...",
    "result_id": "abc123"
  }
}
```

Fetch the full result by downloading from the URL. The URL is pre-signed and valid for a limited time.

## Force File Extension

Reducto automatically detects file types from URLs and content. Override this detection when automatic detection fails or returns incorrect results:

<CodeGroup>
  ```python Python theme={null}
  settings={"force_file_extension": ".pdf"}
  ```

  ```javascript Node.js theme={null}
  settings: { force_file_extension: '.pdf' }
  ```

  ```go Go theme={null}
  AdvancedOptions: reducto.F(shared.AdvancedProcessingOptionsParam{
      ForceFileExtension: reducto.F(".pdf"),
  })
  ```

  ```bash cURL theme={null}
  "settings": {"force_file_extension": ".pdf"}
  ```
</CodeGroup>

Common scenarios:

* URLs without file extensions (e.g., `https://api.example.com/document/12345`)
* URLs with misleading extensions
* Pre-signed URLs with complex query parameters that confuse detection

Valid extensions include `.pdf`, `.png`, `.jpg`, `.docx`, `.xlsx`, `.pptx`, and all other [supported file types](/upload/overview#supported-file-types).
