> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# LLM & service configuration options

> Complete guide to LLM configuration and environment variables for Reducto

## OCR service configuration

For detailed OCR provider configuration (AWS Textract, Azure Vision, GCP Vision API, cross-cloud OCR, and GPU OCR deployment), see the dedicated [OCR provider configuration](/onprem/ocr_options) page.

## LLM provider environment variables

Reducto supports multiple LLM providers through environment variables. Below is a complete list of supported providers and their required environment variables.

### LiteLLM proxy

| Variable                       | Description                              | Required |
| ------------------------------ | ---------------------------------------- | -------- |
| `LITELLM_PROXY_URL`            | URL of the LiteLLM Proxy                 | Yes      |
| `LITELLM_PROXY_FAST_MODEL`     | Fast model to route to via the proxy     | Yes      |
| `LITELLM_PROXY_ACCURATE_MODEL` | Accurate model to route to via the proxy | Yes      |

### OpenAI

| Variable         | Description         | Required |
| ---------------- | ------------------- | -------- |
| `OPENAI_API_KEY` | Your OpenAI API key | Yes      |

### Azure OpenAI

| Variable                 | Description                                                                                                                                                                                                                                                                                                                                                                                                                        | Required |
| ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- |
| `AZURE_OPENAI_API_KEY`   | Your Azure OpenAI API key                                                                                                                                                                                                                                                                                                                                                                                                          | Yes      |
| `AZURE_OPENAI_ENDPOINT`  | Azure OpenAI endpoint (e.g., `https://your-resource-name.openai.azure.com/`)                                                                                                                                                                                                                                                                                                                                                       | Yes      |
| `OPENAI_API_VERSION`     | Azure OpenAI API version (e.g., `2024-10-21`)                                                                                                                                                                                                                                                                                                                                                                                      | Yes      |
| `AZURE_OPENAI_MODEL_MAP` | Comma-separated map (or single default deployment) used to translate model names to Azure deployment names. Reducto uses the following models and each should resolve to a deployment unless a single default is supplied: `gpt-4o-2024-08-06`, `gpt-4o`, `gpt-4o-mini-2024-07-18`, `gpt-4o-mini`, `gpt-4.1`, `o1`. Example mappings: `my-default-deployment` (single default) or `gpt-4o=my-prod-dep, gpt-4o-mini=gpt4o-mini-dep` | Yes      |

### Anthropic

| Variable            | Description            | Required |
| ------------------- | ---------------------- | -------- |
| `ANTHROPIC_API_KEY` | Your Anthropic API key | Yes      |

### Google

| Variable                         | Description                                                                                                                     | Required |
| -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------- | -------- |
| `GOOGLE_APPLICATION_CREDENTIALS` | Service account key json with `roles/aiplatform.user` role for Vertex AI                                                        | Yes      |
| `GCP_PROJECT_ID`                 | GCP project for Cloud Vision API                                                                                                | Yes      |
| `GCP_REGION`                     | Region for Vertex AI, defaults to `us-central1`                                                                                 | No       |
| `GCP_API_KEY`                    | [API key](https://console.cloud.google.com/apis/credentials) with no Application or API restrictions to access Cloud Vision API | Yes      |

### Gemini

| Variable         | Description         | Required |
| ---------------- | ------------------- | -------- |
| `GEMINI_API_KEY` | Your Gemini API key | Yes      |

### AWS Bedrock

| Variable                | Description                                       | Required                |
| ----------------------- | ------------------------------------------------- | ----------------------- |
| `USE_CLAUDE_BEDROCK`    | Set to any value to enable Claude via AWS Bedrock | Yes                     |
| `AWS_ACCESS_KEY_ID`     | AWS access key ID                                 | Yes, when using Bedrock |
| `AWS_SECRET_ACCESS_KEY` | AWS secret access key                             | Yes, when using Bedrock |
| `AWS_REGION`            | AWS region name                                   | Yes, when using Bedrock |

## GPU-based extraction models

Reducto offers GPU-based models for structured data extraction and fine-grained citations. For best results, deploy both models together. Model weights are downloaded from HuggingFace using a scoped token provided by Reducto.

### Prerequisites

Create a Kubernetes secret with the HuggingFace token provided by Reducto:

```bash theme={null}
kubectl create secret generic reducto-hf-token --from-literal=HF_TOKEN=hf_...
```

We recommend enabling `modelStorage` to cache weights on a PVC so restarts don't re-download.

### YAML extraction model (30B)

**GPU requirement:** 1x NVIDIA H200 (will not fit on H100/A100/A10G).

```yaml theme={null}
yamlExtract:
  enabled: true
  gpu: "H200"
  modelStorage:
    enabled: true
    size: "100Gi"
    storageClassName: "your-storage-class"
```

When enabled, `REDUCTO_YAML_EXTRACT_URL` is automatically injected into all worker and HTTP pods.

### Citation model (7B)

**GPU requirement:** 1x NVIDIA H100 or H200.

```yaml theme={null}
citationModel:
  enabled: true
  gpu: "H200"  # or "H100"
  modelStorage:
    enabled: true
    size: "50Gi"
    storageClassName: "your-storage-class"
```

When enabled, `REDUCTO_CITATION_URL` is automatically injected into all worker and HTTP pods. If not deployed, citations fall back to your configured external LLM provider.

### Model path overrides

Both deployments expose a `modelPath` field that can be updated if Reducto ships new model weights:

```yaml theme={null}
yamlExtract:
  modelPath: "reducto/extract_30b_0108"  # update when directed by Reducto

citationModel:
  modelPath: "reducto/citation_7b_mimo_0812"  # update when directed by Reducto
```

### Extraction without GPU models

If you do not deploy either GPU model, extraction uses your configured external LLM provider (OpenAI, Anthropic, Google, Azure, or Bedrock). No additional configuration is needed.

### Fine-tuned OpenAI extraction model (alternative)

| Variable                        | Description                                                           | Required |
| ------------------------------- | --------------------------------------------------------------------- | -------- |
| `LOCAL_EXTRACT_CITATIONS_MODEL` | Fine-tuned OpenAI model ID (e.g., `openai:ft:gpt-4.1-2025-04-14:...`) | No       |

When set, this takes priority over the self-hosted extraction model.

## Request-level LLM overrides

In addition to environment variables, on-prem deployments can override LLM configuration at the request level using the `overrides` parameter in `experimental_options`.

### Key-value processing overrides

Override the model and add custom instructions for key-value (form) region processing:

```json theme={null}
{
  "document_url": "https://example.com/form.pdf",
  "experimental_options": {
    "overrides": {
      "key_value": {
        "model": "google:gemini-2.5-flash-lite",
        "custom_instructions": "Pay special attention to date fields. Use MM/DD/YYYY format."
      }
    }
  }
}
```

| Field                 | Description                                                 |
| --------------------- | ----------------------------------------------------------- |
| `model`               | Model alias (`fast`, `accurate`) or `provider:model` format |
| `custom_instructions` | Additional instructions appended to the default prompt      |

### Resolution order

**Model resolution:**

1. **Request override** - `experimental_options.overrides.key_value.model`
2. **Environment variable** - `LOCAL_KV_MODEL`
3. **Code default** - Based on deployment configuration

**Prompt resolution:**

1. **Base prompt** - `LOCAL_KV_PROMPT` env var, or built-in default
2. **Custom instructions** - Appended from `overrides.key_value.custom_instructions`

### Environment variable defaults

| Variable          | Description                                           | Default                      |
| ----------------- | ----------------------------------------------------- | ---------------------------- |
| `LOCAL_KV_MODEL`  | Override model for KV processing                      | None (uses built-in cascade) |
| `LOCAL_KV_PROMPT` | Base prompt for KV processing (can be fully replaced) | Built-in prompt              |

## AI usage tracking

Reducto includes a comprehensive AI usage tracking system that monitors language model consumption throughout the document processing pipeline. This feature provides detailed insights into token usage, request counts, and model utilization for billing and optimization purposes.

### How AI usage tracking works

The AI usage tracking system operates at the block level within the parsing pipeline:

1. **Token Counting**: Each AI operation (table summarization, figure analysis, key-value extraction, etc.) records token consumption
2. **Request Tracking**: The system counts API calls made to each model
3. **Model Identification**: Usage is tracked per model type with provider information
4. **Aggregation**: Usage is aggregated across all blocks and pages for comprehensive reporting

### Available via /parse API

AI usage information is **currently only available through the `/parse` API endpoint** using the `custom_format` parameter. This feature is not available in other API endpoints.

### Usage information structure

When enabled, the system returns an `AIUsageInfo` object containing:

```json theme={null}
{
  "did_use_ai_models": true,
  "ai_usage_info": [
    {
      "promptTokenCount": 1500,
      "completionTokenCount": 300,
      "cachedTokenCount": 0,
      "requestCount": 2,
      "modelType": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
      "modelProvider": "anthropic",
      "modelRateLimitFamily": "us.anthropic.claude-3-7-sonnet"
    }
  ]
}
```

### Field descriptions

* **`did_use_ai_models`**: Boolean indicating whether any AI models were used during processing
* **`ai_usage_info`**: Array of usage information objects, one per model type used
* **`promptTokenCount`**: Total input tokens sent to the model
* **`completionTokenCount`**: Total output tokens generated by the model
* **`cachedTokenCount`**: Total cached tokens used (when supported by provider)
* **`requestCount`**: Number of API calls made to this model
* **`modelType`**: Standardized model identifier
* **`modelProvider`**: Provider name (e.g., "anthropic", "openai")
* **`modelRateLimitFamily`**: Rate limiting group for the model

### Enabling AI usage tracking

To retrieve AI usage information, set the `custom_format` parameter to `"ai_usage"` in your `/parse` request:

```json theme={null}
{
  "input": "your_document_url",
  "settings": {
    "custom_format": "ai_usage"
  }
}
```

### Tracked AI operations

The system tracks usage from these AI-powered features:

* **Table Summarization**: Analysis and description of complex tables
* **Figure Summarization**: Analysis and description of images and charts
* **Key-Value enrichment**: Enrichment for form-like regions within documents

### Model name standardization

The system automatically standardizes model names for consistent reporting:

* Internal model identifiers are mapped to standard formats
* Provider information is automatically added
* Rate limit families are identified for capacity planning

### Possible model identifiers

The following model identifiers may appear in the `modelType` field of AI usage tracking responses, if you have OpenAI and Anthropic access enabled:

#### OpenAI models

* `gpt-4o-2024-08-06`
* `gpt-4o-mini-2024-07-18`

#### Anthropic models

* `claude-haiku-4-5-20251001`
* `claude-3-7-sonnet-20250219`

If you enable other model providers, they have their own prefixes which will appear.
