Glossary

A

Agentic Mode

An enhancement that uses vision language models (VLMs) to review and correct parsing output. Available for three scopes: text (handwritten content, signatures), table (complex table structures), and figure (charts and diagrams). Adds latency and credits but improves accuracy for difficult documents. See Agentic Modes.

Async Endpoint

API endpoints that return immediately with a job_id instead of waiting for processing to complete. Examples: /parse_async, /extract_async. Use these for large documents, batch processing, or when you want webhook notifications. See Async Processing.

B

Bounding Box (bbox)

Coordinates describing where a block appears on a page. All values are normalized to [0, 1] relative to page dimensions. Fields: left, top, width, height, page, original_page. See Response Format.

Block

The atomic content element in Reducto’s output. Every paragraph, table, header, figure, list item, etc. is a separate block with its own type, content, bounding box, and confidence score. Blocks are grouped into chunks.

C

Chunk

A group of related blocks returned by Parse. Chunking mode determines how blocks are grouped. Each chunk has content (raw Markdown), embed (optimized for vector embeddings), and blocks (individual elements with metadata). See Chunking Methods.

Confidence Score

A measure of parsing accuracy. String confidence is "high" or "low". Granular confidence provides numeric scores (0-1) in parse_confidence and extract_confidence fields.

Content Field

The content field in a chunk contains the raw Markdown representation of extracted text. Tables appear in their original format (HTML/Markdown). Use for display purposes.

Credits

The billing unit for Reducto API usage. Different endpoints and configurations consume different credit amounts. See Credit Usage.

E

Embed Field

The embed field in a chunk is optimized for vector embeddings. When embedding_optimized: true, tables are converted to natural language summaries that embed better than raw Markdown. Use for vector databases and semantic search.

Extract

The /extract endpoint pulls structured data from documents according to a JSON schema you define. Returns typed fields with values and citations. Different from Parse, which returns the full document content.

F

Force URL Result

Setting force_url_result: true makes the API always return results as a URL pointing to JSON, rather than inline in the response. Useful for consistent handling or very large documents. See Processing Settings.

J

Job ID

A unique identifier returned when you submit an async request or complete any API call. Format: UUID like 7600c8c5-a52f-49d2-8a7d-d75d1b51e141. Used to retrieve results, check status, or chain endpoints.

jobid:// URL

A special URL format (jobid://abc123) that lets you reference a previously parsed document without re-uploading or re-parsing it. Use with Extract or Split to avoid duplicate processing.

O

OCR System

The text extraction engine. standard (default) is the best multilingual OCR system. legacy only supports Germanic languages and exists for backwards compatibility. See OCR Settings.

P

Parse

The /parse endpoint extracts text, tables, figures, and structure from documents. Returns chunks and blocks with positional metadata. The foundation for most Reducto workflows.

Persist Results

Setting persist_results: true stores job results indefinitely (requires Studio). Without this, results are deleted after 12 hours per the zero data retention (ZDR) policy.

Pipeline

A saved configuration in Reducto Studio that bundles parsing, extraction, or other operations into a single API call. Call via /pipeline endpoint with a pipeline_id.

Priority

Async jobs can be submitted with priority: true to process ahead of non-priority async jobs. Sync requests are always prioritized over async.

R

reducto:// URL

A special URL format (reducto://abc123) returned by the /upload endpoint. Use this to reference uploaded files in subsequent API calls without re-uploading.

S

Schema

A JSON Schema definition used with the Extract endpoint to specify what data to pull from documents. Defines field names, types, and structure.

Split

The /split endpoint divides documents into logical sections based on natural language descriptions you provide. Returns page ranges for each section with confidence scores.

Studio

Reducto Studio is the web interface for configuring pipelines, viewing results, managing API keys, and monitoring usage. Pipelines created in Studio can be called via the API.

Sync Endpoint

API endpoints that block until processing completes and return the full result. Examples: /parse, /extract. Best for interactive applications with smaller documents.

U

Upload

The /upload endpoint accepts direct file uploads and returns a reducto:// URL for use in other API calls. Use for local files rather than URLs.

URL Result Type

When a Parse response is too large for inline delivery, result.type is "url" and you must fetch the content from result.url. The threshold is approximately 6MB response size.

Z

Zero Data Retention (ZDR)

Reducto’s default data policy. Uploaded documents and job results are deleted within 12 hours. Enable persist_results: true to keep results longer (requires Studio opt-in).

Get Started

Core Functions

Workflows and Pipelines

Configurations

Reference

Components

Enterprise Resources

Security and privacy

On-premise Resources

A

B

C

E

F

J

O

P

R

S

U

Z