Skip to main content

A

An enhancement that uses vision language models (VLMs) to review and correct parsing output. Available for three scopes: text (handwritten content, signatures), table (complex table structures), and figure (charts and diagrams). Adds latency and credits but improves accuracy for difficult documents. See Agentic Modes.
API endpoints that return immediately with a job_id instead of waiting for processing to complete. Examples: /parse_async, /extract_async. Use these for large documents, batch processing, or when you want webhook notifications. See Async Processing.

B

Coordinates describing where a block appears on a page. All values are normalized to [0, 1] relative to page dimensions. Fields: left, top, width, height, page, original_page. See Response Format.
The atomic content element in Reducto’s output. Every paragraph, table, header, figure, list item, etc. is a separate block with its own type, content, bounding box, and confidence score. Blocks are grouped into chunks.

C

A group of related blocks returned by Parse. Chunking mode determines how blocks are grouped. Each chunk has content (raw Markdown), embed (optimized for vector embeddings), and blocks (individual elements with metadata). See Chunking Methods.
A measure of parsing accuracy. String confidence is "high" or "low". Granular confidence provides numeric scores (0-1) in parse_confidence and extract_confidence fields.
The content field in a chunk contains the raw Markdown representation of extracted text. Tables appear in their original format (HTML/Markdown). Use for display purposes.
The billing unit for Reducto API usage. Different endpoints and configurations consume different credit amounts. See Credit Usage.

E

The embed field in a chunk is optimized for vector embeddings. When embedding_optimized: true, tables are converted to natural language summaries that embed better than raw Markdown. Use for vector databases and semantic search.
The /extract endpoint pulls structured data from documents according to a JSON schema you define. Returns typed fields with values and citations. Different from Parse, which returns the full document content.

F

Setting force_url_result: true makes the API always return results as a URL pointing to JSON, rather than inline in the response. Useful for consistent handling or very large documents. See Processing Settings.

J

A unique identifier returned when you submit an async request or complete any API call. Format: UUID like 7600c8c5-a52f-49d2-8a7d-d75d1b51e141. Used to retrieve results, check status, or chain endpoints.
A special URL format (jobid://abc123) that lets you reference a previously parsed document without re-uploading or re-parsing it. Use with Extract or Split to avoid duplicate processing.

O

The text extraction engine. standard (default) is the best multilingual OCR system. legacy only supports Germanic languages and exists for backwards compatibility. See OCR Settings.

P

The /parse endpoint extracts text, tables, figures, and structure from documents. Returns chunks and blocks with positional metadata. The foundation for most Reducto workflows.
Setting persist_results: true stores job results indefinitely (requires Studio). Without this, results are deleted after 12 hours per the zero data retention (ZDR) policy.
A saved configuration in Reducto Studio that bundles parsing, extraction, or other operations into a single API call. Call via /pipeline endpoint with a pipeline_id.
Async jobs can be submitted with priority: true to process ahead of non-priority async jobs. Sync requests are always prioritized over async.

R

A special URL format (reducto://abc123) returned by the /upload endpoint. Use this to reference uploaded files in subsequent API calls without re-uploading.

S

A JSON Schema definition used with the Extract endpoint to specify what data to pull from documents. Defines field names, types, and structure.
The /split endpoint divides documents into logical sections based on natural language descriptions you provide. Returns page ranges for each section with confidence scores.
Reducto Studio is the web interface for configuring pipelines, viewing results, managing API keys, and monitoring usage. Pipelines created in Studio can be called via the API.
API endpoints that block until processing completes and return the full result. Examples: /parse, /extract. Best for interactive applications with smaller documents.

U

The /upload endpoint accepts direct file uploads and returns a reducto:// URL for use in other API calls. Use for local files rather than URLs.
When a Parse response is too large for inline delivery, result.type is "url" and you must fetch the content from result.url. The threshold is approximately 6MB response size.

Z

Reducto’s default data policy. Uploaded documents and job results are deleted within 12 hours. Enable persist_results: true to keep results longer (requires Studio opt-in).