Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt

Use this file to discover all available pages before exploring further.

OCR service configuration

For detailed OCR provider configuration (AWS Textract, Azure Vision, GCP Vision API, cross-cloud OCR, and GPU OCR deployment), see the dedicated OCR provider configuration page.

LLM provider environment variables

Reducto supports multiple LLM providers through environment variables. Below is a complete list of supported providers and their required environment variables.

LiteLLM proxy

VariableDescriptionRequired
LITELLM_PROXY_URLURL of the LiteLLM ProxyYes
LITELLM_PROXY_FAST_MODELFast model to route to via the proxyYes
LITELLM_PROXY_ACCURATE_MODELAccurate model to route to via the proxyYes

OpenAI

VariableDescriptionRequired
OPENAI_API_KEYYour OpenAI API keyYes

Azure OpenAI

VariableDescriptionRequired
AZURE_OPENAI_API_KEYYour Azure OpenAI API keyYes
AZURE_OPENAI_ENDPOINTAzure OpenAI endpoint (e.g., https://your-resource-name.openai.azure.com/)Yes
OPENAI_API_VERSIONAzure OpenAI API version (e.g., 2024-10-21)Yes
AZURE_OPENAI_MODEL_MAPComma-separated map (or single default deployment) used to translate model names to Azure deployment names. Reducto uses the following models and each should resolve to a deployment unless a single default is supplied: gpt-4o-2024-08-06, gpt-4o, gpt-4o-mini-2024-07-18, gpt-4o-mini, gpt-4.1, o1. Example mappings: my-default-deployment (single default) or gpt-4o=my-prod-dep, gpt-4o-mini=gpt4o-mini-depYes

Anthropic

VariableDescriptionRequired
ANTHROPIC_API_KEYYour Anthropic API keyYes

Google

VariableDescriptionRequired
GOOGLE_APPLICATION_CREDENTIALSService account key json with roles/aiplatform.user role for Vertex AIYes
GCP_PROJECT_IDGCP project for Cloud Vision APIYes
GCP_REGIONRegion for Vertex AI, defaults to us-central1No
GCP_API_KEYAPI key with no Application or API restrictions to access Cloud Vision APIYes

Gemini

VariableDescriptionRequired
GEMINI_API_KEYYour Gemini API keyYes

AWS Bedrock

VariableDescriptionRequired
USE_CLAUDE_BEDROCKSet to any value to enable Claude via AWS BedrockYes
AWS_ACCESS_KEY_IDAWS access key IDYes, when using Bedrock
AWS_SECRET_ACCESS_KEYAWS secret access keyYes, when using Bedrock
AWS_REGIONAWS region nameYes, when using Bedrock

GPU-based extraction models

Reducto offers GPU-based models for structured data extraction and fine-grained citations. For best results, deploy both models together. Model weights are downloaded from HuggingFace using a scoped token provided by Reducto.

Prerequisites

Create a Kubernetes secret with the HuggingFace token provided by Reducto:
kubectl create secret generic reducto-hf-token --from-literal=HF_TOKEN=hf_...
We recommend enabling modelStorage to cache weights on a PVC so restarts don’t re-download.

YAML extraction model (30B)

GPU requirement: 1x NVIDIA H200 (will not fit on H100/A100/A10G).
yamlExtract:
  enabled: true
  gpu: "H200"
  modelStorage:
    enabled: true
    size: "100Gi"
    storageClassName: "your-storage-class"
When enabled, REDUCTO_YAML_EXTRACT_URL is automatically injected into all worker and HTTP pods.

Citation model (7B)

GPU requirement: 1x NVIDIA H100 or H200.
citationModel:
  enabled: true
  gpu: "H200"  # or "H100"
  modelStorage:
    enabled: true
    size: "50Gi"
    storageClassName: "your-storage-class"
When enabled, REDUCTO_CITATION_URL is automatically injected into all worker and HTTP pods. If not deployed, citations fall back to your configured external LLM provider.

Model path overrides

Both deployments expose a modelPath field that can be updated if Reducto ships new model weights:
yamlExtract:
  modelPath: "reducto/extract_30b_0108"  # update when directed by Reducto

citationModel:
  modelPath: "reducto/citation_7b_mimo_0812"  # update when directed by Reducto

Extraction without GPU models

If you do not deploy either GPU model, extraction uses your configured external LLM provider (OpenAI, Anthropic, Google, Azure, or Bedrock). No additional configuration is needed.

Fine-tuned OpenAI extraction model (alternative)

VariableDescriptionRequired
LOCAL_EXTRACT_CITATIONS_MODELFine-tuned OpenAI model ID (e.g., openai:ft:gpt-4.1-2025-04-14:...)No
When set, this takes priority over the self-hosted extraction model.

Request-level LLM overrides

In addition to environment variables, on-prem deployments can override LLM configuration at the request level using the overrides parameter in experimental_options.

Key-value processing overrides

Override the model and add custom instructions for key-value (form) region processing:
{
  "document_url": "https://example.com/form.pdf",
  "experimental_options": {
    "overrides": {
      "key_value": {
        "model": "google:gemini-2.5-flash-lite",
        "custom_instructions": "Pay special attention to date fields. Use MM/DD/YYYY format."
      }
    }
  }
}
FieldDescription
modelModel alias (fast, accurate) or provider:model format
custom_instructionsAdditional instructions appended to the default prompt

Resolution order

Model resolution:
  1. Request override - experimental_options.overrides.key_value.model
  2. Environment variable - LOCAL_KV_MODEL
  3. Code default - Based on deployment configuration
Prompt resolution:
  1. Base prompt - LOCAL_KV_PROMPT env var, or built-in default
  2. Custom instructions - Appended from overrides.key_value.custom_instructions

Environment variable defaults

VariableDescriptionDefault
LOCAL_KV_MODELOverride model for KV processingNone (uses built-in cascade)
LOCAL_KV_PROMPTBase prompt for KV processing (can be fully replaced)Built-in prompt

AI usage tracking

Reducto includes a comprehensive AI usage tracking system that monitors language model consumption throughout the document processing pipeline. This feature provides detailed insights into token usage, request counts, and model utilization for billing and optimization purposes.

How AI usage tracking works

The AI usage tracking system operates at the block level within the parsing pipeline:
  1. Token Counting: Each AI operation (table summarization, figure analysis, key-value extraction, etc.) records token consumption
  2. Request Tracking: The system counts API calls made to each model
  3. Model Identification: Usage is tracked per model type with provider information
  4. Aggregation: Usage is aggregated across all blocks and pages for comprehensive reporting

Available via /parse API

AI usage information is currently only available through the /parse API endpoint using the custom_format parameter. This feature is not available in other API endpoints.

Usage information structure

When enabled, the system returns an AIUsageInfo object containing:
{
  "did_use_ai_models": true,
  "ai_usage_info": [
    {
      "promptTokenCount": 1500,
      "completionTokenCount": 300,
      "cachedTokenCount": 0,
      "requestCount": 2,
      "modelType": "us.anthropic.claude-3-7-sonnet-20250219-v1:0",
      "modelProvider": "anthropic",
      "modelRateLimitFamily": "us.anthropic.claude-3-7-sonnet"
    }
  ]
}

Field descriptions

  • did_use_ai_models: Boolean indicating whether any AI models were used during processing
  • ai_usage_info: Array of usage information objects, one per model type used
  • promptTokenCount: Total input tokens sent to the model
  • completionTokenCount: Total output tokens generated by the model
  • cachedTokenCount: Total cached tokens used (when supported by provider)
  • requestCount: Number of API calls made to this model
  • modelType: Standardized model identifier
  • modelProvider: Provider name (e.g., “anthropic”, “openai”)
  • modelRateLimitFamily: Rate limiting group for the model

Enabling AI usage tracking

To retrieve AI usage information, set the custom_format parameter to "ai_usage" in your /parse request:
{
  "input": "your_document_url",
  "settings": {
    "custom_format": "ai_usage"
  }
}

Tracked AI operations

The system tracks usage from these AI-powered features:
  • Table Summarization: Analysis and description of complex tables
  • Figure Summarization: Analysis and description of images and charts
  • Key-Value enrichment: Enrichment for form-like regions within documents

Model name standardization

The system automatically standardizes model names for consistent reporting:
  • Internal model identifiers are mapped to standard formats
  • Provider information is automatically added
  • Rate limit families are identified for capacity planning

Possible model identifiers

The following model identifiers may appear in the modelType field of AI usage tracking responses, if you have OpenAI and Anthropic access enabled:

OpenAI models

  • gpt-4o-2024-08-06
  • gpt-4o-mini-2024-07-18

Anthropic models

  • claude-haiku-4-5-20251001
  • claude-3-7-sonnet-20250219
If you enable other model providers, they have their own prefixes which will appear.