Documentation Index
Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
Use this file to discover all available pages before exploring further.
OCR service configuration
For detailed OCR provider configuration (AWS Textract, Azure Vision, GCP Vision API, cross-cloud OCR, and GPU OCR deployment), see the dedicated OCR provider configuration page.
LLM provider environment variables
Reducto supports multiple LLM providers through environment variables. Below is a complete list of supported providers and their required environment variables.
LiteLLM proxy
| Variable | Description | Required |
|---|
LITELLM_PROXY_URL | URL of the LiteLLM Proxy | Yes |
LITELLM_PROXY_FAST_MODEL | Fast model to route to via the proxy | Yes |
LITELLM_PROXY_ACCURATE_MODEL | Accurate model to route to via the proxy | Yes |
OpenAI
| Variable | Description | Required |
|---|
OPENAI_API_KEY | Your OpenAI API key | Yes |
Azure OpenAI
| Variable | Description | Required |
|---|
AZURE_OPENAI_API_KEY | Your Azure OpenAI API key | Yes |
AZURE_OPENAI_ENDPOINT | Azure OpenAI endpoint (e.g., https://your-resource-name.openai.azure.com/) | Yes |
OPENAI_API_VERSION | Azure OpenAI API version (e.g., 2024-10-21) | Yes |
AZURE_OPENAI_MODEL_MAP | Comma-separated map (or single default deployment) used to translate model names to Azure deployment names. Reducto uses the following models and each should resolve to a deployment unless a single default is supplied: gpt-4o-2024-08-06, gpt-4o, gpt-4o-mini-2024-07-18, gpt-4o-mini, gpt-4.1, o1. Example mappings: my-default-deployment (single default) or gpt-4o=my-prod-dep, gpt-4o-mini=gpt4o-mini-dep | Yes |
Anthropic
| Variable | Description | Required |
|---|
ANTHROPIC_API_KEY | Your Anthropic API key | Yes |
Google
| Variable | Description | Required |
|---|
GOOGLE_APPLICATION_CREDENTIALS | Service account key json with roles/aiplatform.user role for Vertex AI | Yes |
GCP_PROJECT_ID | GCP project for Cloud Vision API | Yes |
GCP_REGION | Region for Vertex AI, defaults to us-central1 | No |
GCP_API_KEY | API key with no Application or API restrictions to access Cloud Vision API | Yes |
Gemini
| Variable | Description | Required |
|---|
GEMINI_API_KEY | Your Gemini API key | Yes |
AWS Bedrock
| Variable | Description | Required |
|---|
USE_CLAUDE_BEDROCK | Set to any value to enable Claude via AWS Bedrock | Yes |
AWS_ACCESS_KEY_ID | AWS access key ID | Yes, when using Bedrock |
AWS_SECRET_ACCESS_KEY | AWS secret access key | Yes, when using Bedrock |
AWS_REGION | AWS region name | Yes, when using Bedrock |
Reducto offers GPU-based models for structured data extraction and fine-grained citations. For best results, deploy both models together. Model weights are downloaded from HuggingFace using a scoped token provided by Reducto.
Prerequisites
Create a Kubernetes secret with the HuggingFace token provided by Reducto:
kubectl create secret generic reducto-hf-token --from-literal=HF_TOKEN=hf_...
We recommend enabling modelStorage to cache weights on a PVC so restarts don’t re-download.
GPU requirement: 1x NVIDIA H200 (will not fit on H100/A100/A10G).
yamlExtract:
enabled: true
gpu: "H200"
modelStorage:
enabled: true
size: "100Gi"
storageClassName: "your-storage-class"
When enabled, REDUCTO_YAML_EXTRACT_URL is automatically injected into all worker and HTTP pods.
Citation model (7B)
GPU requirement: 1x NVIDIA H100 or H200.
citationModel:
enabled: true
gpu: "H200" # or "H100"
modelStorage:
enabled: true
size: "50Gi"
storageClassName: "your-storage-class"
When enabled, REDUCTO_CITATION_URL is automatically injected into all worker and HTTP pods. If not deployed, citations fall back to your configured external LLM provider.
Model path overrides
Both deployments expose a modelPath field that can be updated if Reducto ships new model weights:
yamlExtract:
modelPath: "reducto/extract_30b_0108" # update when directed by Reducto
citationModel:
modelPath: "reducto/citation_7b_mimo_0812" # update when directed by Reducto
If you do not deploy either GPU model, extraction uses your configured external LLM provider (OpenAI, Anthropic, Google, Azure, or Bedrock). No additional configuration is needed.
| Variable | Description | Required |
|---|
LOCAL_EXTRACT_CITATIONS_MODEL | Fine-tuned OpenAI model ID (e.g., openai:ft:gpt-4.1-2025-04-14:...) | No |
When set, this takes priority over the self-hosted extraction model.
Request-level LLM overrides
In addition to environment variables, on-prem deployments can override LLM configuration at the request level using the overrides parameter in experimental_options.
Key-value processing overrides
Override the model and add custom instructions for key-value (form) region processing:
{
"document_url": "https://example.com/form.pdf",
"experimental_options": {
"overrides": {
"key_value": {
"model": "google:gemini-2.5-flash-lite",
"custom_instructions": "Pay special attention to date fields. Use MM/DD/YYYY format."
}
}
}
}
| Field | Description |
|---|
model | Model alias (fast, accurate) or provider:model format |
custom_instructions | Additional instructions appended to the default prompt |
Resolution order
Model resolution:
- Request override -
experimental_options.overrides.key_value.model
- Environment variable -
LOCAL_KV_MODEL
- Code default - Based on deployment configuration
Prompt resolution:
- Base prompt -
LOCAL_KV_PROMPT env var, or built-in default
- Custom instructions - Appended from
overrides.key_value.custom_instructions
Environment variable defaults
| Variable | Description | Default |
|---|
LOCAL_KV_MODEL | Override model for KV processing | None (uses built-in cascade) |
LOCAL_KV_PROMPT | Base prompt for KV processing (can be fully replaced) | Built-in prompt |
AI usage tracking
Reducto includes a comprehensive AI usage tracking system that monitors language model consumption throughout the document processing pipeline. This feature provides detailed insights into token usage, request counts, and model utilization for billing and optimization purposes.
How AI usage tracking works
The AI usage tracking system operates at the block level within the parsing pipeline:
- Token Counting: Each AI operation (table summarization, figure analysis, key-value extraction, etc.) records token consumption
- Request Tracking: The system counts API calls made to each model
- Model Identification: Usage is tracked per model type with provider information
- Aggregation: Usage is aggregated across all blocks and pages for comprehensive reporting
Available via /parse API
AI usage information is currently only available through the /parse API endpoint using the custom_format parameter. This feature is not available in other API endpoints.
When enabled, the system returns an AIUsageInfo object containing:
{
"did_use_ai_models": true,
"ai_usage_info": [
{
"promptTokenCount": 1500,
"completionTokenCount": 300,
"cachedTokenCount": 0,
"requestCount": 2,
"modelType": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
"modelProvider": "anthropic",
"modelRateLimitFamily": "us.anthropic.claude-3-7-sonnet"
}
]
}
Field descriptions
did_use_ai_models: Boolean indicating whether any AI models were used during processing
ai_usage_info: Array of usage information objects, one per model type used
promptTokenCount: Total input tokens sent to the model
completionTokenCount: Total output tokens generated by the model
cachedTokenCount: Total cached tokens used (when supported by provider)
requestCount: Number of API calls made to this model
modelType: Standardized model identifier
modelProvider: Provider name (e.g., “anthropic”, “openai”)
modelRateLimitFamily: Rate limiting group for the model
Enabling AI usage tracking
To retrieve AI usage information, set the custom_format parameter to "ai_usage" in your /parse request:
{
"input": "your_document_url",
"settings": {
"custom_format": "ai_usage"
}
}
Tracked AI operations
The system tracks usage from these AI-powered features:
- Table Summarization: Analysis and description of complex tables
- Figure Summarization: Analysis and description of images and charts
- Key-Value enrichment: Enrichment for form-like regions within documents
Model name standardization
The system automatically standardizes model names for consistent reporting:
- Internal model identifiers are mapped to standard formats
- Provider information is automatically added
- Rate limit families are identified for capacity planning
Possible model identifiers
The following model identifiers may appear in the modelType field of AI usage tracking responses, if you have OpenAI and Anthropic access enabled:
OpenAI models
gpt-4o-2024-08-06
gpt-4o-mini-2024-07-18
Anthropic models
claude-haiku-4-5-20251001
claude-3-7-sonnet-20250219
If you enable other model providers, they have their own prefixes which will appear.