Reducto API Reference for Coding Agents

This page is a dense, structured reference designed for AI coding agents. It contains everything needed to integrate Reducto without navigating multiple pages.

Fastest Successful Path

Use this decision table before writing code:

Situation	Use	First command or call
Local file or folder in a coding-agent workspace	CLI	`reducto parse ./document.pdf`
Agent should call Reducto as tools	MCP server	`parse_document(document_url="https://cdn.reducto.ai/samples/fidelity-example.pdf")`
App code in Python	Python SDK	`client.parse.run(input="https://cdn.reducto.ai/samples/fidelity-example.pdf")`
App code in Node.js	Node.js SDK	`await client.parse.run({ input: "https://cdn.reducto.ai/samples/fidelity-example.pdf" })`
No SDK allowed	REST API	`POST https://platform.reducto.ai/parse`

For a benchmark or smoke test, parse this public sample first:

https://cdn.reducto.ai/samples/fidelity-example.pdf

Then replace it with the user’s document URL or upload a local file and pass the returned reducto:// file ID.

Product Summary

Reducto is the agentic document platform. It provides a complete toolkit for classification, parsing, extraction, splitting, editing, and workflow orchestration across documents (PDFs, images, spreadsheets, DOCX, and 30+ other formats) via a REST API.

Base URL: https://platform.reducto.ai
Auth: Authorization: Bearer $REDUCTO_API_KEY
SDKs: Python (pip install reductoai), Node.js (npm install reductoai), Go (go get github.com/reductoai/reducto-go-sdk)
Input: Upload a file via /upload to get a file_id, then pass it to any endpoint. You can also pass public URLs or presigned S3/GCS/Azure URLs directly.

Authentication

Create a free account at studio.reducto.ai
In the Studio sidebar, click API Keys, then Create new API key
Set the key as an environment variable:

# macOS / Linux
export REDUCTO_API_KEY="your_api_key_here"

# Windows (PowerShell)
$env:REDUCTO_API_KEY="your_api_key_here"

The Python and Node.js SDKs automatically read REDUCTO_API_KEY from the environment. For the Go SDK, pass it explicitly:

client := reducto.NewClient(option.WithAPIKey(os.Getenv("REDUCTO_API_KEY")))

For direct REST calls, pass it as a Bearer token:

curl -X POST https://platform.reducto.ai/parse \
  -H "Authorization: Bearer $REDUCTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": "https://example.com/doc.pdf"}'

Supported File Types

Category	Formats
PDF	`.pdf`
Documents	`.docx`, `.doc`, `.dotx`, `.rtf`, `.txt`, `.wpd`
Spreadsheets	`.xlsx`, `.xlsm`, `.xls`, `.xltx`, `.xltm`, `.csv`, `.qpw`
Presentations	`.pptx`, `.ppt`
Images	`.png`, `.jpg`/`.jpeg`, `.gif`, `.bmp`, `.tiff`, `.heic`, `.psd`, `.pcx`, `.ppm`, `.apng`, `.cur`, `.dcx`, `.ftex`, `.pixar`

Upload limit: 100MB direct, 5GB via presigned URL. Multi-page TIFFs are processed as multi-page documents.

Which Endpoint Should I Use?

I want to…	Endpoint	Method	Key config
Get all text, tables, and figures from a document separated into chunks with bounding box coordinates	`/parse`	POST	`enhance.agentic` for a stronger model pass to correct mistakes
Extract specific fields into JSON using a specific schema	`/extract`	POST	`instructions.schema` (JSON Schema)
Divide a document into named sections by page range	`/split`	POST	`split_description` (section definitions)
Fill PDF or DOCX forms	`/edit`	POST	`edit_instructions` (natural language)
Classify a document’s type before processing	`/classify`	POST	`classification_schema` (categories + criteria)
Upload a local file for processing	`/upload`	POST (multipart)	`file` field
Process asynchronously with webhooks	`/parse_async`, `/extract_async`, `/split_async`, `/edit_async`	POST	`webhook` URL
Check job status or retrieve results	`/job/{job_id}`	GET	-

Quick Start (Python)

from pathlib import Path
from reducto import Reducto

client = Reducto()  # reads REDUCTO_API_KEY from env

# --- Option A: Pass a URL directly (no upload needed) ---
parse_result = client.parse.run(input="https://example.com/document.pdf")

# --- Option B: Upload a local file first ---
upload = client.upload(file=Path("document.pdf"))
parse_result = client.parse.run(input=upload.file_id)

# --- Handle the response (important: check result.type for large docs) ---
import requests

if parse_result.result.type == "url":
    # Large documents return a URL instead of inline content
    chunks = requests.get(parse_result.result.url).json()
else:
    chunks = parse_result.result.chunks

for chunk in chunks:
    # Use dict access for URL results, attribute access for inline results
    content = chunk["content"] if isinstance(chunk, dict) else chunk.content
    print(content)

# --- Extract: pull specific fields ---
extract_result = client.extract.run(
    input=upload.file_id,
    instructions={
        "schema": {
            "type": "object",
            "properties": {
                "invoice_number": {"type": "string", "description": "The invoice number"},
                "total": {"type": "number", "description": "Total amount due"}
            }
        }
    }
)
# result is a list, access first item for single-document extraction
data = extract_result.result[0]
print(data["invoice_number"], data["total"])

# --- Split: find section boundaries ---
split_result = client.split.run(
    input=upload.file_id,
    split_description=[
        {"name": "Summary", "description": "Executive summary section"},
        {"name": "Financials", "description": "Financial statements and tables"}
    ]
)
for split in split_result.result.splits:
    print(f"{split.name}: pages {split.pages}")

# --- Classify: identify document type ---
classify_result = client.classify.run(
    input=upload.file_id,
    classification_schema=[
        {"category": "invoice", "criteria": ["billing info", "itemized charges"]},
        {"category": "contract", "criteria": ["legal terms", "signatures"]}
    ]
)
print(classify_result.result)

# --- Edit: fill a form ---
# NOTE: Edit uses "document_url" instead of "input" (unlike other endpoints)
edit_result = client.edit.run(
    document_url=upload.file_id,
    edit_instructions="Fill Name: John Doe, Date: 2024-01-15, Check 'Yes' for US Citizen"
)
print(edit_result.document_url)  # URL to download filled document

SDK Naming Conventions

SDK	Property names	Install	Notes
Python	snake_case (`array_extract`, `file_id`)	`pip install reductoai`	Client auto-reads `REDUCTO_API_KEY` env var
Node.js	snake_case (`array_extract`, `file_id`)	`npm install reductoai`	All methods return promises, use `await`
Go	PascalCase (`ArrayExtract`, `FileID`)	`go get github.com/reductoai/reducto-go-sdk`	Wrap values with `reducto.F()`, use `shared.UnionString()` for document URLs
REST	snake_case in JSON body	-	`Authorization: Bearer $REDUCTO_API_KEY` header

Parse Parameters

POST /parse. Convert documents into structured JSON with text, tables, and figures.

Core Parameters

Parameter	Type	Default	Description
`input`	string	required	File ID (`reducto://...`), public URL, presigned URL, or `jobid://...` to reprocess

enhance group

Parameter	Type	Default	Description
`enhance.agentic`	array	`[]`	List of agentic scopes. Each item has a `scope` field.
`enhance.agentic[].scope`	`"text"` \| `"table"` \| `"figure"`	-	AI correction scope. `text`: OCR cleanup for scanned docs. `table`: fix misaligned columns. `figure`: chart data extraction. Adds latency and cost.
`enhance.agentic[].prompt`	string \| null	`null`	Custom prompt for agentic processing
`enhance.agentic[].advanced_chart_agent`	bool	`false`	Structured chart data extraction (figure scope only)
`enhance.summarize_figures`	bool	`true`	Generate natural language descriptions of figures for RAG
`enhance.intelligent_ordering`	bool	`false`	Use vision model to improve reading order accuracy

retrieval group

Parameter	Type	Default	Description
`retrieval.chunking.chunk_mode`	`"disabled"` \| `"variable"` \| `"section"` \| `"page"` \| `"block"` \| `"page_sections"`	`"disabled"`	`disabled`: one chunk for entire doc. `variable`: semantic boundaries (best for RAG). `section`: split at headers. `page`: one chunk per page. `page_sections`: sections within each page.
`retrieval.chunking.chunk_size`	int \| null	`null`	Target chunk size in characters. Defaults to 250-1500 range in variable mode.
`retrieval.chunking.chunk_overlap`	int	`0`	Characters of overlap between adjacent chunks
`retrieval.filter_blocks`	string[]	`[]`	Block types to exclude from `content`/`embed`. Options: `"Header"`, `"Footer"`, `"Title"`, `"Section Header"`, `"Page Number"`, `"List Item"`, `"Figure"`, `"Table"`, `"Key Value"`, `"Text"`, `"Comment"`, `"Signature"`
`retrieval.embedding_optimized`	bool	`false`	Optimize output for embedding models

formatting group

Parameter	Type	Default	Description
`formatting.table_output_format`	`"dynamic"` \| `"html"` \| `"md"` \| `"json"` \| `"csv"` \| `"jsonbbox"`	`"dynamic"`	`dynamic`: auto-selects md or html based on complexity. `html`: best for complex/merged cells. `md`: simple tables. `json`: programmatic cell access.
`formatting.add_page_markers`	bool	`false`	Add page markers to output
`formatting.merge_tables`	bool	`false`	Merge consecutive tables with same column count
`formatting.include`	string[]	`[]`	Include: `"change_tracking"`, `"highlight"`, `"comments"`, `"hyperlinks"`, `"signatures"`, `"ignore_watermarks"`

spreadsheet group

Parameter	Type	Default	Description
`spreadsheet.split_large_tables.enabled`	bool	`true`	Split large tables into smaller tables
`spreadsheet.split_large_tables.size`	int	`50`	Rows per chunk for split tables
`spreadsheet.clustering`	`"accurate"` \| `"fast"` \| `"disabled"`	`"accurate"`	Algorithm for splitting sheets into tables. Accurate uses more powerful models (5x cost).
`spreadsheet.include`	string[]	`[]`	Include: `"cell_colors"`, `"formula"`, `"dropdowns"`

settings group

Parameter	Type	Default	Description
`settings.page_range`	object \| null	`null`	`{"start": 1, "end": 10}` (1-indexed). Process specific pages only.
`settings.return_images`	string[]	`[]`	Return image URLs for block types: `"figure"`, `"table"`, `"page"`
`settings.ocr_system`	`"standard"` \| `"legacy"`	`"standard"`	`standard`: best multilingual OCR. `legacy`: Germanic languages only.
`settings.extraction_mode`	`"ocr"` \| `"hybrid"`	`"hybrid"`	`hybrid`: combines OCR with embedded PDF text (recommended). `ocr`: OCR only.
`settings.persist_results`	bool	`false`	Keep results indefinitely (default: expire after 24h)
`settings.force_url_result`	bool	`false`	Always return results as a URL
`settings.timeout`	float \| null	`null`	Custom timeout in seconds
`settings.document_password`	string \| null	`null`	Password for encrypted documents
`settings.embed_pdf_metadata`	bool	`false`	Embed OCR metadata into returned PDF
`settings.embed_pdf_metadata_dpi`	int	`100`	Render DPI for the rasterized pages of the embedded-OCR PDF. Range 50-250. Default 100 is suitable for on-screen viewing; raise toward the source scan DPI for crisper output when zoomed past ~200%.

Parse Response Shape

{
  "job_id": "uuid",
  "duration": 3.89,
  "result": {
    "type": "full",           // "full" (inline) or "url" (fetch from URL)
    "chunks": [
      {
        "content": "# Heading\n\nText content...",   // Markdown-formatted
        "embed": "Heading. Text content...",          // Embedding-optimized
        "blocks": [
          {
            "type": "Title",       // Title, Section Header, Text, Table, Figure, Key Value, etc.
            "content": "Heading",
            "bbox": {"left": 0.1, "top": 0.05, "width": 0.3, "height": 0.04, "page": 1},
            "confidence": "high"   // "high" or "low"
          }
        ]
      }
    ]
  },
  "usage": {"num_pages": 3, "credits": 4.0},
  "studio_link": "https://studio.reducto.ai/job/..."
}

When result.type is "url", chunks are not inline. Fetch them from the URL:

import requests

if parse_result.result.type == "url":
    chunks = requests.get(parse_result.result.url).json()
    # chunks are plain dicts when fetched via URL
else:
    chunks = parse_result.result.chunks
    # chunks are SDK objects with attribute access

Extract Parameters

POST /extract. Pull specific fields from documents into structured JSON using a schema. Extract runs Parse internally. If a value doesn’t appear in the Parse output, Extract cannot extract it.

Parameter	Type	Default	Description
`input`	string \| string[]	required	File ID, URL, `jobid://...`, or array of job IDs to combine
`instructions.schema`	object	`{}`	JSON Schema defining fields to extract. Field names and descriptions directly influence accuracy.
`instructions.system_prompt`	string	`"Be precise and thorough."`	Document-level context for the LLM
`settings.array_extract`	bool	`false`	Segment document for long arrays. Required when schema has array fields in long documents. Schema must have at least one top-level `array` property.
`settings.deep_extract`	bool	`false`	Agentic mode that iteratively refines output for near-perfect accuracy. Higher cost/latency.
`settings.citations.enabled`	bool	`false`	Return source page, bbox, and text for each value. Mutually exclusive with chunking.
`settings.citations.numerical_confidence`	bool	`true`	Include 0-1 confidence scores (vs “high”/“low”)
`settings.include_images`	bool	`false`	Include page images in extraction context
`settings.optimize_for_latency`	bool	`false`	Higher priority processing at 2x cost
`parsing`	object	`{}`	All Parse parameters (see above). Ignored if input is `jobid://`.

Extract Response Shape

{
  "result": [
    {
      "invoice_number": "INV-2024-001",
      "total": 1250.00,
      "line_items": [{"description": "Widget", "amount": 500.00}]
    }
  ],
  "job_id": "uuid",
  "usage": {"num_fields": 4, "num_pages": 2, "credits": 10.0},
  "studio_link": "https://studio.reducto.ai/job/..."
}

With citations.enabled: true, each value is wrapped:

{
  "result": {
    "total": {
      "value": 1250.00,
      "citations": [
        {
          "type": "Table",
          "content": "Total: $1,250.00",
          "bbox": {"left": 0.04, "top": 0.26, "width": 0.45, "height": 0.50, "page": 2},
          "confidence": "high"
        }
      ]
    }
  }
}

Split Parameters

POST /split. Divide documents into named sections by page number. Split runs Parse internally, then uses an LLM to classify pages against your section descriptions.

Parameter	Type	Default	Description
`input`	string	required	File ID, URL, or `jobid://...`
`split_description`	array	required	List of `{"name": "...", "description": "..."}` section definitions
`split_rules`	string	`"Split the document into the applicable sections..."`	Natural language rules for splitting behavior
`settings.table_cutoff`	`"truncate"` \| `"preserve"`	`"truncate"`	`truncate`: first rows only (faster). `preserve`: all content.
`parsing`	object	`{}`	All Parse parameters. Ignored if input is `jobid://`.

Split Response Shape

{
  "result": {
    "splits": [
      {"name": "Executive Summary", "pages": [1, 2]},
      {"name": "Financial Statements", "pages": [3, 4, 5, 6]},
      {"name": "Risk Factors", "pages": [7, 8, 9]}
    ]
  },
  "job_id": "uuid",
  "usage": {"num_pages": 9, "credits": 6.0}
}

Edit Parameters

POST /edit. Fill PDF forms and modify DOCX documents. Note: Edit uses document_url as its input parameter, not input like other endpoints.

Parameter	Type	Default	Description
`document_url`	string	required	File ID or URL of the document to edit (this is `document_url`, not `input`)
`edit_instructions`	string	required	Natural language instructions. Be explicit: `"Fill Name: John Doe, Date: 2024-01-15"`
`edit_options.color`	string	`"#FF0000"`	Highlight color for edits (DOCX only)
`edit_options.enable_overflow_pages`	bool	`false`	Create appendix pages for text exceeding field capacity (PDF only)
`form_schema`	array \| null	`null`	Pre-defined field locations for repeatable form filling. Skips detection.

Edit Response Shape

{
  "document_url": "https://storage.reducto.ai/filled-form.pdf?...",
  "form_schema": [
    {
      "bbox": {"left": 0.1, "top": 0.2, "width": 0.4, "height": 0.03, "page": 1},
      "description": "Name field",
      "type": "text"
    }
  ],
  "usage": {"num_pages": 2, "credits": 8}
}

The document_url is a presigned URL valid for 24 hours. Save the returned form_schema to reuse for the same form type (skips field detection).

Classify Parameters

POST /classify. Categorize a document before processing.

Parameter	Type	Default	Description
`input`	string	required	File ID or URL
`classification_schema`	array	required	List of `{"category": "...", "criteria": ["...", "..."]}`
`page_range`	object \| null	`null`	Pages to use for classification context. Defaults to first 5 pages. Max 10 pages.
`document_metadata`	string \| null	`null`	Optional metadata to include in classification prompt

Classify Response Shape

{
  "result": {
    "category": "invoice"
  },
  "job_id": "uuid",
  "duration": 1.23
}

Async Processing

Most endpoints have async variants (/parse_async, /extract_async, /split_async, /edit_async). Classify is synchronous only. Async endpoints return a job_id immediately and process in the background.

# Submit async job
job = client.parse.run_job(input=upload.file_id)
print(job.job_id)

# Poll for results
import time
while True:
    result = client.job.get(job.job_id)
    if result.status in ("Completed", "Failed"):
        break
    time.sleep(2)

Configure webhooks for push-based delivery instead of polling.

Error Codes

HTTP Status	Meaning	Common cause
401	Unauthorized	Missing or invalid `REDUCTO_API_KEY`
422	Validation error	Invalid parameters, schema too large, or constraint violation
429	Rate limited	Too many concurrent requests. Retry with backoff.
500	Server error	Transient issue. Retry with backoff.

Useful Links

API Reference: Full OpenAPI spec with request/response details
Parse Configuration: All configuration options
Cookbooks: End-to-end tutorials (invoice extraction, form filling, RAG)
Error Codes: Complete error catalog
Rate Limits: Request limits and quotas
Credit Usage: How credits are calculated

​Fastest Successful Path

​Product Summary

​Authentication

​Supported File Types

​Which Endpoint Should I Use?

​Quick Start (Python)

​SDK Naming Conventions

​Parse Parameters

​Core Parameters

​enhance group

​retrieval group

​formatting group

​spreadsheet group

​settings group

​Parse Response Shape

​Extract Parameters

​Extract Response Shape

​Split Parameters

​Split Response Shape

​Edit Parameters

​Edit Response Shape

​Classify Parameters

​Classify Response Shape

​Async Processing

​Error Codes

​Useful Links