> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# On-premise changelog

> Release notes for on-premise deployments of Reducto

export const PasswordProtect = ({children}) => {
  const [password, setPassword] = useState("");
  const [isAuthenticated, setIsAuthenticated] = useState(false);
  const [error, setError] = useState("");
  const correctPasswordHash = "9daff39ca2584edc54444193f62e5e54dce0bcd5e5d604b1748c79bfb3d7d1fd";
  const hashPassword = async inputPassword => {
    const encoder = new TextEncoder();
    const data = encoder.encode(inputPassword);
    const hashBuffer = await crypto.subtle.digest('SHA-256', data);
    const hashArray = Array.from(new Uint8Array(hashBuffer));
    return hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
  };
  useEffect(() => {
    const storedPassword = localStorage.getItem("reducto-onprem-password");
    if (storedPassword) {
      checkStoredPassword(storedPassword);
    }
  }, []);
  const checkStoredPassword = async storedPassword => {
    const hashedStored = await hashPassword(storedPassword);
    if (hashedStored === correctPasswordHash) {
      setIsAuthenticated(true);
      setError("");
    }
  };
  const handleSubmit = async e => {
    e.preventDefault();
    const hashedInput = await hashPassword(password);
    if (hashedInput === correctPasswordHash) {
      setIsAuthenticated(true);
      setError("");
      localStorage.setItem("reducto-onprem-password", password);
    } else {
      setError("Incorrect password. Please try again.");
      setPassword("");
    }
  };
  if (isAuthenticated) {
    return <>{children}</>;
  }
  return <div style={{
    padding: "2rem",
    border: "2px solid #e2e8f0",
    borderRadius: "8px",
    textAlign: "center",
    margin: "2rem 0"
  }}>
      <div style={{
    fontSize: "2rem",
    marginBottom: "1rem"
  }}>🔒</div>
      <h2>Protected Content</h2>
      <p>This content requires a password to access.</p>
      <form onSubmit={handleSubmit} style={{
    marginTop: "1rem"
  }}>
        <input type="password" value={password} onChange={e => setPassword(e.target.value)} placeholder="Enter password" style={{
    padding: "0.5rem",
    border: "1px solid #cbd5e0",
    borderRadius: "4px",
    marginRight: "0.5rem",
    fontSize: "1rem"
  }} />
        <button type="submit" style={{
    padding: "0.5rem 1rem",
    backgroundColor: "#5c0c5c",
    color: "white",
    border: "none",
    borderRadius: "4px",
    cursor: "pointer",
    fontSize: "1rem"
  }}>
          Unlock
        </button>
      </form>
      {error && <p style={{
    color: "red",
    marginTop: "1rem"
  }}>{error}</p>}
    </div>;
};

<PasswordProtect>
  <Update description="2026-03-26" label="v1.11.57">
    * perf: Parallelize KV fallback to prevent task deadline breaches
  </Update>

  <Update description="2026-03-26" label="v1.11.56">
    * feat: Native DOCX XML parsing pipeline (alpha) with `.pages` support
    * feat: HEIC image support and section-based chunking for Numbers files
    * feat: Formatting and images support for Numbers files
    * feat: Extract model and internal prompt overrides for v2/v3 configurations
    * feat: YAML extract and citations models added to Helm chart for GPU deployments
    * feat: KV repetition detection with Gemini fallback replacing repetition\_penalty
    * feat: OTEL pipeline routing and K8s metrics collection
    * fix: Auth for chained jobs
    * fix: Required fields on extraction schema
    * fix: Equation detection TypeError from tuple/list concatenation
    * fix: Move sync blocking calls off the event loop in HTTP handlers
    * fix: Use encrypted DB URL and disable k8s metrics in on-prem environments
    * fix: Fail fast on hung PDF renders
    * fix: Parallelize S3 batch result loading for faster retrieval
    * perf: Optimized layout postprocessing
    * perf: Optimized hybrid OCR processing
    * refactor: Graceful fallbacks for XML conversion issues
    * chore: Cron retries increased from 1 to 2 for improved reliability
    * chore: Send total credits for on-prem customers
  </Update>

  <Update description="2026-03-05" label="v1.11.51">
    * feat: OCR-based table citations for deep extract
    * feat: Add API key prefix filter parameter for /jobs endpoint
    * fix: On-prem presigned URL upload path mismatch
    * fix: new layout postprocessing
    * fix: Empty OCR fallback handling
    * fix: Python version in sandbox runtime
    * chore: Upgraded enhanced figure summary models
  </Update>

  <Update description="2026-03-01" label="v1.11.49">
    * fix: on-prem deployments start without Redis configured
  </Update>

  <Update description="2026-03-01" label="v1.11.48">
    * feat: On-prem usage logging for customer tracking
  </Update>

  <Update description="2026-03-01" label="v1.11.47">
    * feat: Schemaless deep extract
    * feat: Granular citations in deep extract
    * feat: Deep extract available for on-prem deployments
    * feat: Spreadsheet sheet-name page\_range support
    * feat: Suppress citation content feature flag for extract
    * fix: Garbled DOCX for change tracking
    * fix: Recursive render to handle nested tables for edit
    * fix: Rotation in embed metadata
    * fix: GCP API key requirement for on-prem
    * fix: Remove libpq options from DB connect\_args for RDS Proxy compatibility
    * fix: Lock timeout batches and non-locking last-batch check
    * fix: Background threads no longer block main processing
    * fix: Transient classify inference issues
    * perf: Merge tables speedup from O(N^2) to O(N)
    * perf: Retry improvements to avoid double work
    * refactor: Decompose batch pipeline into composable phases
    * chore: Upgrade Gemini models
  </Update>

  <Update description="2026-02-25" label="v1.11.46">
    * fix: Middleware context propagation and ordering for proper trace handling
    * fix: Cron job retry logic and syntax improvements
    * fix: HTTP startup patched for hashlib.md5 in FIPS environments (GCS support)
    * perf: Decaying timeout for retries with improved retry\_on\_timeout behavior
    * perf: Updated timeout and max batch configurations
  </Update>

  <Update description="2026-02-24" label="v1.11.45">
    * feat: Chunk overlap configuration for including text context from previous/next chunks
    * feat: summarize\_all\_figures option in v3 alpha config
    * feat: Deep extraction optimizations for improved structured data quality
    * fix: Lazy loading for HTTP/worker modules to avoid unnecessary dependency imports
    * fix: Guard against empty document\_url list in pipeline and split endpoints
    * fix: Cron job improvements
    * fix: More reliable page orientation detection
    * fix: Exception handling on deep extraction completion
    * refactor: Deep extract sandbox image
    * chore: Upgraded Anthropic models
    * chore: New table detection model with improved accuracy
  </Update>

  <Update description="2026-02-17" label="v1.11.44">
    * fix: Add bounds check for page index in PDF text embedding to prevent IndexError crashes
    * fix: Skip cover pages for PDF portfolios during attachment concatenation
    * fix: Fallback to original pages when portfolio has no PDF attachments
    * fix: Fast-fail URL download on non-success HTTP status codes
    * fix: Random checkbox YOLO crash when Conv has no batch normalization
    * fix: HuggingFace model downloads for builds
    * fix: Lazily import probing modules to avoid Modal dependency in on-prem
    * feat: Custom agentic layout postprocessing
    * feat: Routing for parse batches
    * feat: Classify concurrency improvements
    * refactor: Set reducto environment to `onprem` by default
    * chore: Upgrade pytorch and torchvision dependencies
  </Update>

  <Update description="2026-02-13" label="v1.11.42">
    * fix: Detect visual redlines (colored strikethrough/underline) in DOCX change tracking
    * fix: Use min instead of max for checkbox detection
    * fix: CSV parsing truncation and scientific notation for large integers
    * fix: Recover in-progress batches alongside pending ones
    * feat: Fallback to Gemini Flash for improved reliability
    * feat: Intelligent ordering fallbacks
    * feat: Schema adherence model updates
    * feat: Add ONNX model integrity verification with forced fresh model download
    * refactor: Database lock timeout for sync DB engine
  </Update>

  <Update description="2026-02-09" label="v1.11.41">
    * feat: New OCR recognition model for Apple deployments
    * fix: Recover in-progress batches alongside pending ones
    * fix: Add ONNX model integrity verification and force fresh model download
  </Update>

  <Update description="2026-02-06" label="v1.11.40">
    * feat: Classify endpoint with parallelized Gemini Flash Lite probes for document classification
    * feat: Add bucket\_name as alpha option in v3 parse config
    * feat: Enable bucket & KMS ARN override for hybrid VPC deployments
    * feat: Auto region routing for Gemini models
    * feat: Native office conversion alpha flag in v3 config
    * feat: Inference helm charts and kv-base routing
    * feat: Enable flatten for edit endpoint
    * feat: Improved models for standard figure summary
    * feat: Add dimension limit handling for AWS environments
    * fix: Memory leaks and file descriptor leaks in PIL Image handling across OCR and processing pipelines
    * fix: N-squared completion pattern in batch processing for significantly improved performance at scale
    * fix: Race condition in parse completion job processing
    * fix: Argument order bug in pdftext multiprocessing extraction
    * fix: V3 config fixes for on-prem deployments
    * fix: Initialize empty sheets to prevent errors on blank spreadsheets
    * fix: Force resize to fit AWS dimension limits for large documents
    * fix: Image conversion failures now return proper 415 error instead of 500
    * fix: PyPDFForm version update to resolve form filling bug
    * fix: Classify endpoint fixes for improved reliability
    * fix: Offset\_in\_chunk calculation for empty blocks
    * fix: Exclude veryHidden sheets when exclude\_hidden\_sheets is enabled
    * fix: Checkbox detection bug
    * fix: Prioritize S3/BUCKET over GCS when both GCP\_PROJECT\_ID and BUCKET are set
    * fix: cron.py Kubernetes usage
    * fix: Distributed traces with LOGFIRE\_DISTRIBUTED\_TRACING
    * fix: Temperature 0.1 for promptable layout for more deterministic results
    * fix: local-full Dockerfile fix by adding gcc and python3-dev to apt install
    * perf: Hydrate SharedBatchWorker.process\_org\_batch before ThreadPoolExecutor for improved concurrency
    * refactor: Remove enhanced enrich tables, default to same model for simpler table processing
    * chore: Upgrade table models
  </Update>

  <Update description="2026-01-30" label="v1.11.38">
    * feat: V3 config overrides for v2-only and on-prem-only settings
    * feat: List item support and chunk offsets in blocks for improved extraction
    * fix: handle\_required\_fields not adding missing fields to array items in extraction
    * fix: Page marker blocks now include correct page and original\_page values
    * fix: Multi-batch recovery when job processing is interrupted
    * fix: Division by zero error for images with corrupted EXIF data
  </Update>

  <Update description="2026-01-29" label="v1.11.37">
    * fix: Restore V2 OCR defaults (highres OCR system) in V3 on-prem config for consistent behavior
  </Update>

  <Update description="2026-01-29" label="v1.11.36">
    * chore: Bookworm image build configuration for CD pipeline
  </Update>

  <Update description="2026-01-29" label="v1.11.35">
    * feat: Bookworm Dockerfile variant for improved on-prem DOCX→PDF conversion reliability
  </Update>

  <Update description="2026-01-28" label="v1.11.34">
    * fix: DOCX→PDF conversion using LibreOffice from Trixie backports for improved reliability
    * feat: Super-agent integration into /extract pipeline for improved structured data extraction
    * fix: Traceparent propagation for API requests
    * perf: Per-image table predictions for better performance
  </Update>

  <Update description="2026-01-27" label="v1.11.33">
    * feat: Hybrid VPC routing based on header with default AU/EU/US regions
    * feat: Add docx fallbacks for malformed XML and OOXML-format .doc files
    * feat: Change default presigned URL expiration from 1 hour to 12 hours
    * fix: Table edit pattern improvements and preferred edit model changes
  </Update>

  <Update description="2026-01-24" label="v1.11.32">
    * feat: Schema adherence for required keys in extraction
    * feat: Improved table edit granularity
    * fix: Properly propagate password errors for password-protected PDFs
  </Update>

  <Update description="2026-01-20" label="v1.11.31">
    * fix: Anthropic Bedrock on-prem edit calls
    * feat: Add raw XML repair fallback for malformed docx files
    * fix: Local parse hanging for multi-batch documents
    * feat: Schema Optimization Agent for improved extraction accuracy
  </Update>

  <Update description="2026-01-16" label="v1.11.29">
    * fix: Hyperlinks being dropped when OCR extraction mode is enabled
    * feat: Add line level offsets when config is enabled
    * feat: Intelligent Ordering Model API integration
  </Update>

  <Update description="2025-12-15" label="v1.11.25">
    * feat: Add document\_password support to pipeline API for password-protected documents
    * feat: Implement character-level DOCX change tracking
    * fix: Hidden rows and columns handling for spreadsheets
    * feat: Cloudflare R2 Storage Class support
  </Update>

  <Update description="2025-12-02" label="v1.11.21">
    * fix: Settings Overrides for streamlined API config/env var customization
  </Update>

  <Update description="2025-12-02" label="v1.11.20">
    * chore: inference parallelization
    * feat: OCR word and line rotation data propagation
    * fix: layout prediction improvements
    * feat: extract schema adherence
    * fix: empty table model output
  </Update>

  <Update description="2025-12-01" label="v1.11.19">
    * refactor: Document fetching logic
    * feat: Updated ordering model
    * feat(settings): more streamlined customization for models and prompts via API configuration / env variables
  </Update>

  <Update description="2025-11-25" label="v1.11.18">
    * feat: Customizable models for AWS Bedrock using environment variables
    * refactor: Default models updated for AWS Bedrock to `us.anthropic.claude-sonnet-4-5-20250929-v1:0`
  </Update>

  <Update description="2025-11-24" label="v1.11.17">
    * feat: support more edge case custom file mimetypes
  </Update>

  <Update description="2025-11-21" label="v1.11.16">
    * fix: table chunking
    * fix: make enrich tables more robust
  </Update>

  <Update description="2025-11-21" label="v1.11.15">
    * refactor: optimize some DB transactions to not be left open too long
    * feat: Add signatures as a formatting option in v3 config
    * feat: extract schema adherence
    * refactor: optimize enrich tables latency
    * feat: new hybrid OCR implementation
    * feat: add force file mimetype to extension config option
  </Update>

  <Update description="2025-11-17" label="v1.11.14">
    * feat: env var based customization for local KV prompt/model
    * feat: Add priority-based worker routing to skip shared/dedicated workers when priority is not set
    * chore: adding latency sensitive for fast mode in Spreadsheet Agent
    * feat: add OpenAI Responses LLM Provider
    * feat: Allow direct DataDog Tracing with Beta Headers and Logfire Service name handling
  </Update>

  <Update description="2025-11-12" label="v1.11.13">
    * fix: table block chunking
  </Update>

  <Update description="2025-11-11" label="v1.11.12">
    * fix: Chainguard image dependencies
  </Update>

  <Update description="2025-11-10" label="v1.11.11">
    * fix: md5 for FIPS environments
    * fix: numbers file parsing
  </Update>

  <Update description="2025-11-07" label="v1.11.10">
    * fix: allow invalid surrogates when encoding
    * feat: Add logfire gauge metrics for K8s queue lengths
  </Update>

  <Update description="2025-11-07" label="v1.11.9">
    * feat: Add Azure Blob Storage authentication support for private endpoints
    * fix: race condition with in progress batch -> job completion enqueue
    * fix: logfire logging if logfire token is set
  </Update>

  <Update description="2025-11-06" label="v1.11.8">
    * fix: embed pdf metadata
    * fix: persist results before webhook
    * fix: update cancel\_all and wipe endpoints for on-prem and secure them correctly
    * refactor: cron cleanup function + running frequency
    * fix: Chainguard image dependency issues
  </Update>

  <Update description="2025-11-04" label="v1.11.7">
    * fix: PgDog Helm Chart application version configuration for on-premise deployments
  </Update>

  <Update description="2025-11-04" label="v1.11.6">
    * feat: V3 API config with improved spreadsheet response format and citations support
    * feat: Enhanced table block chunking for better extraction of large tables
    * feat: Agent-in-the-loop (AITL) extraction with generalizable configuration for multiple fields
    * feat: Spreadsheet figure summary support for better data visualization
    * feat: LLM provider preference configuration for v3 API (specify OpenAI, Anthropic, Google, etc.)
    * feat: Helm chart PgDog dependency for PostgreSQL monitoring
    * feat: Affinity and topologySpreadConstraints support in Helm charts for advanced pod scheduling
    * fix: OCR system handling in v3 config
    * fix: Race condition for single batch jobs
    * fix: Webhook delivery on Kubernetes environments
    * fix: Parse job update batching for improved database performance
    * fix: DOCX timeout increased for large document processing
    * fix: Table merging with XML parsing improvements
    * fix: Underline/strikethrough character threshold adjustments
    * chore: Datadog integration for enhanced monitoring
  </Update>

  <Update description="2025-10-30" label="v1.11.5">
    * feat: Reduce PDF output size by avoiding text layer rasterization
    * feat: Custom chunking response format support
    * fix: AITL configuration handling for proper field validation
  </Update>

  <Update description="2025-10-28" label="v1.11.4">
    * feat: Agent-in-the-loop (AITL) documentation exposed and configuration updated to handle multiple fields
    * feat: Hyperlink extraction support in PDF parsing - preserves document links in output
    * feat: PostgreSQL Helm dependency migrated to OCI registry for better reliability
    * feat: Spreadsheet figure summary generation for visual data extraction
    * fix: OCR system switching for v3 config
    * fix: Change tracking for accurate document diff detection
  </Update>

  <Update description="2025-10-25" label="v1.11.3">
    * feat: Helm charts now support affinity and topologySpreadConstraints for advanced Kubernetes pod placement control
    * feat: Table merging heuristics improved
    * fix: Webhook delivery on Kubernetes fixed for reliable notification
    * fix: Underline and strikethrough detection threshold adjusted for better accuracy
    * fix: Safe fill implementation used everywhere in PDF form filling
    * chore: Datadog monitoring integration
  </Update>

  <Update description="2025-10-22" label="v1.11.2">
    * feat: V3 API config support - new configuration format for improved extraction control
    * feat: Naive table merging for bulk processing with better cross-page detection
    * feat: Figure summary enhancements and configuration via API
    * feat: LLM provider preference support in v3 config
    * feat: Tool use support for Anthropic provider
    * feat: /openapi.json and /openapi-legacy.json endpoints for API schema access
    * feat: Split implementation improvements
    * fix: Database transaction handling in Kubernetes - don't keep transactions open
    * fix: Experimental table citations now default to true in v3
    * fix: Extract confidence concurrency handling
    * chore: Figure summarization adjusted for more thorough output
  </Update>

  <Update description="2025-10-12" label="v1.11.1">
    * fix: Helm chart labels for retry stale jobs cronjob
  </Update>

  <Update description="2025-10-12" label="v1.11.0">
    * fix: Build configuration cleanup
  </Update>

  <Update description="2025-10-12" label="v1.10.38">
    * feat: Support for custom extract models via LLM service, enabling on-premise model configurations
    * fix: PDF form dropdown filling improvements with proper context and option handling
    * fix: Excel column to string conversion using openpyxl
  </Update>

  <Update description="2025-10-10" label="v1.10.37">
    * feat: New /jobs endpoint with cursor-based pagination for efficient job listing and filtering
    * feat: New PDF edit flow using parse pipeline for improved form filling accuracy and performance
    * feat: Schema-less extraction generation - automatically infer extraction schemas when not provided
    * feat: Enhanced table merging across pages in HTML documents with improved row/column detection
    * feat: Improved spreadsheet agent with citations support and performance optimizations
    * feat: Parallelized batch results loading from storage for faster retrieval
    * fix: Multi-page TIFF and JPEG handling for proper page extraction
    * fix: Password-protected landscape PDF processing
    * fix: Text overlay visibility issues during edit flow
    * fix: Spreadsheet agent formatting values in preview mode
    * chore: Docker base image upgraded to Debian Trixie for better security and compatibility
    * chore: Enhanced mode set as default for better quality
  </Update>

  <Update description="2025-10-08" label="v1.10.36">
    * feat: Priority handling for time-sensitive extraction requests with improved page mapping reasoning
    * feat: Improved Vertex AI Gemini region configuration
    * fix: Array extract error handling - prevents crashes from malformed LLM output
    * fix: Better concurrency management for key-value extraction
    * fix: Split configuration handling improvements
    * chore: OpenAI API retry logic for handling slow responses
    * chore: Exponential backoff for split operations
  </Update>

  <Update description="2025-10-07" label="v1.10.35">
    * feat: Improved layout inference with reduced latency
  </Update>

  <Update description="2025-10-04" label="v1.10.34">
    * feat: PDF edit overlay improvements using OCR-B font for better text rendering
    * fix: Timeout configuration improvements for long-running operations
  </Update>

  <Update description="2025-10-01" label="v1.10.33">
    * fix: Worker stability improvements and bug fixes
  </Update>

  <Update description="2025-09-29" label="v1.10.32">
    * feat: Spreadsheet extraction agent enhancements for better cell and table detection
    * fix: Citation formatting improvements across extraction outputs
  </Update>

  <Update description="2025-09-24" label="v1.10.31">
    * feat: Cross-page table merging improvements with naive row merging implementation
    * fix: Performance optimizations for large document processing
  </Update>

  <Update description="2025-09-24" label="v1.10.30">
    * feat: Enhanced extraction pipeline with improved data handling
    * fix: Error handling improvements throughout the system
  </Update>

  <Update description="2025-09-18" label="v1.10.29">
    * feat: GCP Workload Identity support for Google Cloud deployments
    * feat: AWS region override configuration for flexible cloud deployments
    * fix: Worker stability enhancements
  </Update>

  <Update description="2025-09-17" label="v1.10.28">
    * feat: Prometheus alerting integration for monitoring
    * fix: Reliability enhancements for long-running jobs
  </Update>

  <Update description="2025-09-17" label="v1.10.27">
    * feat: opt-in or opt-out to send billing usage to license server
    * feat: block OpenAI invocation with BLOCK\_OPENAI env var
    * feat: signature detection
    * fix: helm chart template rendering
    * fix: update figure summarization to correctly override default prompt when user wants to override
    * refactor: update equations detection to use on premise-provided LLMs
  </Update>

  <Update description="2025-01-16" label="v1.10.24">
    * feat: configurable S3 endpoint url
  </Update>

  <Update description="2025-09-08" label="v1.10.21">
    * feat: character-level support for azure in hybrid mode
    * feat: support for docx comments
  </Update>

  <Update description="2025-09-03" label="v1.10.16">
    * feat: split support with gemini on vertex ai
  </Update>

  <Update description="2025-09-01" label="v1.10.13">
    * refactor: updated LLM service with vision/text
    * fix: ensure formatted text (i.e. underline, strikethroughs) is not subsumed by key value detection
  </Update>

  <Update description="2025-08-28" label="v1.10.12">
    * feat: secret management in helm chart
    * feat: add .msg file support
  </Update>

  <Update description="2025-08-26" label="v1.10.11">
    * feat: HEIC file format support for image processing
    * feat: character-level OCR detection for strikethrough and underline formatting
    * feat: parallelize and optimize PDF metadata embedding for improved performance
    * fix: add locks to prevent race conditions
    * fix: timeout handling for DOCX to PDF conversion with proper 400 status codes
  </Update>

  <Update description="2025-08-20" label="v1.10.8">
    * feat: configurable S3 SSL options for boto
    * feat: /billing-usage API for exporting usage in air-gapped deployments
  </Update>

  <Update description="2025-08-15" label="v1.10.7">
    * feat: Support for Google Cloud Storage gs\:// document url
    * feat: timeout and fail jobs and batches when queued for GLOBAL\_QUEUE\_TIMEOUT\_SEC
    * feat: BackendConfig support in Helm Chart for GCP
  </Update>

  <Update description="2025-08-01" label="v1.10.4">
    * feat: generate extract schema if no schema was provided
    * feat: adding form schema for edit documentation
  </Update>

  <Update description="2025-08-01" label="v1.10.3">
    * feat: improve cold starts
    * fix: fine-grained citation fixes
  </Update>

  <Update description="2025-07-31" label="v1.10.2">
    * feat: DOCX improvements
    * feat: added schema token limits
    * feat: add customizations to auth via environment variables
    * feat: faster model inference optimizations
  </Update>

  <Update description="2025-07-30" label="v1.10.1">
    * fix: OCR image resizing improvements
  </Update>

  <Update description="2025-07-29" label="v1.10.0">
    * feat: implement fault-tolerant webhook delivery
    * feat: fix table headers for html parsing
    * feat: add secret metadata parameter to /job/{job_id} endpoint
    * feat: include config when include\_metadata is enabled for job endpoint
    * feat: adding sheet color to output
    * feat: clean up refs in extract output
    * feat: excel table color mapping implementation
    * feat: enhance merge tables
    * feat: table feedback loop using the enrich table flag
    * feat: add litellm proxy model for 'best'
  </Update>

  <Update description="2025-07-19" label="v1.9.94">
    * feat: long-polling with timeout (seconds) query param for `/job/{job_id}`
  </Update>

  <Update description="2025-07-16" label="v1.9.93">
    * fix: job type and add duration field in `/jobs` endpoint
    * fix: Preserve all decimals in md tables
    * feat: support for GCP
  </Update>

  <Update description="2025-07-14" label="v1.9.92">
    * feat: presentation detection and kv-disabling
  </Update>

  <Update description="2025-07-11" label="v1.9.91">
    * fix: Strike underline tuning
  </Update>

  <Update description="2025-07-11" label="v1.9.90">
    * feat: implement rtf
    * fix: Fix offsets for tables extracted from excel sheets
    * feat: add optional confidence fields to OCRWord and OCRLine
    * feat: add source in `/jobs`
    * feat: initial change detection implementation
  </Update>

  <Update description="2025-07-10" label="v1.9.89">
    * docs: clarify Excel citation coordinate system differences
    * feat: Integrate spreadsheet agent for extract
    * feat: Convert images to pdf for pdf\_url
    * Fix: Allow Gemini to output `<empty>` and `<signature>` fields for key-value
    * chore: update textract quota
    * feat: option for multiplatform builds for onprem
    * docs: Add Model Governance Policy to security section
    * fix: Split regex fix for subcategory
    * fix: Add retries to spreadsheet agent
    * fix: Merging splits in the new format
    * fix: Merge array\_extract citations based on extract results
    * Add chart extraction documentation page
    * feat: ship both small/large models in built images
    * feat: allow changing default use\_gpu\_ocr config value based on env var
  </Update>

  <Update description="2025-07-03" label="v1.9.88">
    * feat: handle multipart/form-data content type errors on /split endpoint
    * Latency fix: Agentic unicode changes
    * fix: sanitize html file upload path to s3
    * Add strict typing for SplitResult.splits
    * feat: Helm chart and values for GPU-based OCR deployment
  </Update>

  <Update description="2025-06-23" label="v1.9.86">
    * feat: enhanced DOCX change tracking with improved underline detection and formatting accuracy
    * fix: optimized model server initialization to reduce startup time and improve processing performance
  </Update>

  <Update description="2025-06-23" label="v1.9.85">
    * fix: resolved document conversion hangs caused by separate executor processes for improved reliability
  </Update>

  <Update description="2025-06-20" label="v1.9.84">
    * feat: added cancel\_all endpoint for on-prem deployments to cancel all running jobs at once
    * feat: enhanced extraction with schema key normalization and improved page range references in citations
    * feat: added global timeout overrides for better performance control and reliability
    * fix: resolved document conversion hangs with global timeout implementation
    * fix: improved change tracking validation with proper error handling
  </Update>

  <Update description="2025-06-18" label="v1.9.83">
    * feat: enhanced DOCX metadata extraction for improved change tracking capabilities
    * fix: improved Excel citation handling when OCR data is not available
  </Update>

  <Update description="2025-06-18" label="v1.9.82">
    * feat: added on-prem licensing alerts when connection to license.reducto.ai fails
    * feat: implemented timeout functionality for improved processing performance and reliability
    * fix: improved authentication on /upload and /cancel endpoints for better security
  </Update>

  <Update description="2025-06-13" label="v1.9.81">
    * feat: enhanced multilingual OCR text embedding with support for Latin, CJK, Cyrillic, and Devanagari scripts using custom Unifont font
    * feat: file-based authentication system for on-prem deployments with Kubernetes secret mounting support
    * feat: automatic file cleanup system with configurable retention windows (default 60 minutes) to manage storage usage
    * fix: improved authentication reliability with retry logic for API validation calls
    * fix: enhanced extraction pipeline to handle None extract\_outputs and improve data merging
    * fix: native office conversion now skips files over 150MB and falls back to LibreOffice for better reliability
    * fix: improved citation confidence handling when confidence values are null
    * fix: batch processing improvements to keep batches alive for large documents
  </Update>

  <Update description="2025-06-11" label="v1.9.80">
    * feat: enhanced array extraction to work with non-array fields for improved data extraction flexibility
    * feat: improved LLM error handling and timeout support for more reliable model calls
    * feat: added support for OpenDocument Text (.odt) file uploads through existing LibreOffice conversion pipeline
    * fix: improved block merging logic to properly update table content during document enrichment
    * fix: added retry logic for database errors on job status requests to improve reliability
    * fix: preserve empty blocks (such as figures with no content) in final document layout
  </Update>

  <Update description="2025-06-09" label="v1.9.79">
    * feat: default to big extract model
    * feat: automatic file cleanup
    * fix: support for azure openai
    * feat: add hidden sheet/row/column filtering for Excel processing
    * feat: Fix jsonbbox and citations for excel
    * fix: pdf processing timeout
    * fix: Fix DOCX to PDF conversion error status code from 500 to 400
  </Update>

  <Update description="2025-06-05" label="v1.9.78">
    * feat: enable change tracking capability
    * fix: remove the large figure filter in dfine layout model postprocessing
  </Update>

  <Update description="2025-06-04" label="v1.9.77">
    * feat: Split blocks for array\_extract on excel to separate pages
  </Update>

  <Update description="2025-06-04" label="v1.9.76">
    * feat: Persist the full result for url results to persist bucket
    * feat: Query jobs by user-id and fair queueing docs
    * fix: Edit conditionals so full tables aren't returned in citations
    * fix: Fix job type error when cancelling a job
    * fix: on-prem changelog auth on light and dark mode
    * fix: Rename file to include guessed extension if one isn't already included
    * fix: handle empty bbox arrays in layout postprocess calculations
  </Update>

  <Update description="2025-06-02" label="v1.9.75">
    * feat: update replicated helm chart
    * feat: surface all table citations in v2
  </Update>

  <Update description="2025-05-30" label="v1.9.74">
    * feat: internal webhook via IPC on job completion
    * feat: expose table citations in extraction results
    * feat: add persist config option that persists parsebatches and results
    * fix: keep batch alive for large HTML documents
    * fix: check for empty document\_url list in extract
    * refactor: PDF editing with PyPDFForm for improved form handling
  </Update>

  <Update description="2025-05-29" label="v1.9.73">
    * feat: update extraction pipeline with improved array handling
    * fix: f-string usage in logfire calls across the codebase
  </Update>

  <Update description="2025-05-28" label="v1.9.72">
    * fix: OpenAI vision LLM calls in LLM router
  </Update>

  <Update description="2025-05-28" label="v1.9.71">
    * fix: root level acroform rendering (preserve form values)
    * feat: add onprem config option to enable figure summaries for all figures
    * feat: add exclude\_configs query param to /jobs endpoint to reduce response size
    * fix: onprem CD now corrects the `/version` url to the latest version number
  </Update>

  <Update description="2025-05-23" label="v1.9.70">
    Enhanced support for OCR text embedding in PDFs with `embed_text_metadata_pdf` flag.
  </Update>

  <Update description="2025-05-23" label="v1.9.69">
    Added support for routing on-premise deployments to v2 extraction pipeline and improved exception handling for subscription errors in Stripe usage logging. Fixed Pydantic AI Agent tools configuration in the document editing functionality.
  </Update>

  <Update description="2025-05-23" label="v1.9.68">
    Added Azure OpenAI support and improved table model with a new fallback
    mechanism. Enhanced webhook validation and added support for selective
    customer notifications via target channels parameter.
  </Update>

  <Update description="2025-05-19" label="v1.9.67">
    Added support for LiteLLM Proxy configuration via environment variables:

    * `LITELLM_PROXY_URL`: URL of the LiteLLM Proxy
    * `LITELLM_PROXY_FAST_MODEL`: Fast model to route to via the proxy
    * `LITELLM_PROXY_ACCURATE_MODEL`: Accurate model to route to via the proxy

    When using the proxy configuration:

    * Both fast and accurate models must be defined if using the proxy URL
    * Existing LiteLLM routing options are overridden when proxy settings are active

    This enables easier integration with centralized proxy setups for model routing and observability.
  </Update>

  <Update description="2025-04-07" label="v1.9.48">
    Add a `/wipe` endpoint to the On-Prem API to wipe the database of all parse jobs, batches, and tasks. This is only available to on prem customers and is a good fail safe.

    Ensure that this is not available or exposed to the users. Should be a backend only failsafe. Please let us know if you'd like this disabled or removed in your deployment.
  </Update>

  <Update description="2025-04-07" label="v1.9.47">
    In this release, we make some query optimizations to significantly reduce CPU usage of the Postgres DB at high document volumes (e.g. > 1k pg/min).

    In Google Cloud environments, we improved the skew detection capability by updating our thresholds to more intelligently detect skew in certain cases. Some additional bug fixes were made for folks who specify an LLM provider preference.
  </Update>
</PasswordProtect>
