> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Reducto API Reference for Coding Agents

> Complete, structured reference for AI coding agents integrating the Reducto document processing API

This page is a dense, structured reference designed for AI coding agents. It contains everything needed to integrate Reducto without navigating multiple pages.

## Product Summary

Reducto converts documents (PDFs, images, spreadsheets, DOCX, and 30+ other formats) into structured data via a REST API.

* **Base URL:** `https://platform.reducto.ai`
* **Auth:** `Authorization: Bearer $REDUCTO_API_KEY`
* **SDKs:** Python (`pip install reductoai`), Node.js (`npm install reductoai`), Go (`go get github.com/reductoai/reducto-go-sdk`)
* **Input:** Upload a file via `/upload` to get a `file_id`, then pass it to any endpoint. You can also pass public URLs or presigned S3/GCS/Azure URLs directly.

***

## Authentication

1. Create a free account at [studio.reducto.ai](https://studio.reducto.ai/)
2. In the Studio sidebar, click **API Keys**, then **Create new API key**
3. Set the key as an environment variable:

```bash theme={null}
# macOS / Linux
export REDUCTO_API_KEY="your_api_key_here"

# Windows (PowerShell)
$env:REDUCTO_API_KEY="your_api_key_here"
```

The Python and Node.js SDKs automatically read `REDUCTO_API_KEY` from the environment. For the Go SDK, pass it explicitly:

```go theme={null}
client := reducto.NewClient(option.WithAPIKey(os.Getenv("REDUCTO_API_KEY")))
```

For direct REST calls, pass it as a Bearer token:

```bash theme={null}
curl -X POST https://platform.reducto.ai/parse \
  -H "Authorization: Bearer $REDUCTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"input": "https://example.com/doc.pdf"}'
```

***

## Supported File Types

| Category      | Formats                                                                                                                      |
| ------------- | ---------------------------------------------------------------------------------------------------------------------------- |
| PDF           | `.pdf`                                                                                                                       |
| Documents     | `.docx`, `.doc`, `.dotx`, `.rtf`, `.txt`, `.wpd`                                                                             |
| Spreadsheets  | `.xlsx`, `.xlsm`, `.xls`, `.xltx`, `.xltm`, `.csv`, `.qpw`                                                                   |
| Presentations | `.pptx`, `.ppt`                                                                                                              |
| Images        | `.png`, `.jpg`/`.jpeg`, `.gif`, `.bmp`, `.tiff`, `.heic`, `.psd`, `.pcx`, `.ppm`, `.apng`, `.cur`, `.dcx`, `.ftex`, `.pixar` |

Upload limit: 100MB direct, 5GB via [presigned URL](/upload/large-files). Multi-page TIFFs are processed as multi-page documents.

***

## Which Endpoint Should I Use?

| I want to...                                                                                          | Endpoint                                                        | Method           | Key config                                                      |
| ----------------------------------------------------------------------------------------------------- | --------------------------------------------------------------- | ---------------- | --------------------------------------------------------------- |
| Get all text, tables, and figures from a document separated into chunks with bounding box coordinates | `/parse`                                                        | POST             | `enhance.agentic` for a stronger model pass to correct mistakes |
| Extract specific fields into JSON using a specific schema                                             | `/extract`                                                      | POST             | `instructions.schema` (JSON Schema)                             |
| Divide a document into named sections by page range                                                   | `/split`                                                        | POST             | `split_description` (section definitions)                       |
| Fill PDF or DOCX forms                                                                                | `/edit`                                                         | POST             | `edit_instructions` (natural language)                          |
| Classify a document's type before processing                                                          | `/classify`                                                     | POST             | `classification_schema` (categories + criteria)                 |
| Upload a local file for processing                                                                    | `/upload`                                                       | POST (multipart) | `file` field                                                    |
| Process asynchronously with webhooks                                                                  | `/parse_async`, `/extract_async`, `/split_async`, `/edit_async` | POST             | `webhook` URL                                                   |
| Check job status or retrieve results                                                                  | `/job/{job_id}`                                                 | GET              | -                                                               |

***

## Quick Start (Python)

```python theme={null}
from pathlib import Path
from reducto import Reducto

client = Reducto()  # reads REDUCTO_API_KEY from env

# --- Option A: Pass a URL directly (no upload needed) ---
parse_result = client.parse.run(input="https://example.com/document.pdf")

# --- Option B: Upload a local file first ---
upload = client.upload(file=Path("document.pdf"))
parse_result = client.parse.run(input=upload.file_id)

# --- Handle the response (important: check result.type for large docs) ---
import requests

if parse_result.result.type == "url":
    # Large documents return a URL instead of inline content
    chunks = requests.get(parse_result.result.url).json()
else:
    chunks = parse_result.result.chunks

for chunk in chunks:
    # Use dict access for URL results, attribute access for inline results
    content = chunk["content"] if isinstance(chunk, dict) else chunk.content
    print(content)

# --- Extract: pull specific fields ---
extract_result = client.extract.run(
    input=upload.file_id,
    instructions={
        "schema": {
            "type": "object",
            "properties": {
                "invoice_number": {"type": "string", "description": "The invoice number"},
                "total": {"type": "number", "description": "Total amount due"}
            }
        }
    }
)
# result is a list, access first item for single-document extraction
data = extract_result.result[0]
print(data["invoice_number"], data["total"])

# --- Split: find section boundaries ---
split_result = client.split.run(
    input=upload.file_id,
    split_description=[
        {"name": "Summary", "description": "Executive summary section"},
        {"name": "Financials", "description": "Financial statements and tables"}
    ]
)
for split in split_result.result.splits:
    print(f"{split.name}: pages {split.pages}")

# --- Classify: identify document type ---
classify_result = client.classify.run(
    input=upload.file_id,
    classification_schema=[
        {"category": "invoice", "criteria": ["billing info", "itemized charges"]},
        {"category": "contract", "criteria": ["legal terms", "signatures"]}
    ]
)
print(classify_result.result)

# --- Edit: fill a form ---
# NOTE: Edit uses "document_url" instead of "input" (unlike other endpoints)
edit_result = client.edit.run(
    document_url=upload.file_id,
    edit_instructions="Fill Name: John Doe, Date: 2024-01-15, Check 'Yes' for US Citizen"
)
print(edit_result.document_url)  # URL to download filled document
```

***

## SDK Naming Conventions

| SDK     | Property names                           | Install                                      | Notes                                                                        |
| ------- | ---------------------------------------- | -------------------------------------------- | ---------------------------------------------------------------------------- |
| Python  | snake\_case (`array_extract`, `file_id`) | `pip install reductoai`                      | Client auto-reads `REDUCTO_API_KEY` env var                                  |
| Node.js | snake\_case (`array_extract`, `file_id`) | `npm install reductoai`                      | All methods return promises, use `await`                                     |
| Go      | PascalCase (`ArrayExtract`, `FileID`)    | `go get github.com/reductoai/reducto-go-sdk` | Wrap values with `reducto.F()`, use `shared.UnionString()` for document URLs |
| REST    | snake\_case in JSON body                 | -                                            | `Authorization: Bearer $REDUCTO_API_KEY` header                              |

***

## Parse Parameters

`POST /parse`. Convert documents into structured JSON with text, tables, and figures.

### Core Parameters

| Parameter | Type   | Default      | Description                                                                         |
| --------- | ------ | ------------ | ----------------------------------------------------------------------------------- |
| `input`   | string | **required** | File ID (`reducto://...`), public URL, presigned URL, or `jobid://...` to reprocess |

### enhance group

| Parameter                                | Type                                | Default | Description                                                                                                                                         |
| ---------------------------------------- | ----------------------------------- | ------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
| `enhance.agentic`                        | array                               | `[]`    | List of agentic scopes. Each item has a `scope` field.                                                                                              |
| `enhance.agentic[].scope`                | `"text"` \| `"table"` \| `"figure"` | -       | AI correction scope. `text`: OCR cleanup for scanned docs. `table`: fix misaligned columns. `figure`: chart data extraction. Adds latency and cost. |
| `enhance.agentic[].prompt`               | string \| null                      | `null`  | Custom prompt for agentic processing                                                                                                                |
| `enhance.agentic[].advanced_chart_agent` | bool                                | `false` | Structured chart data extraction (figure scope only)                                                                                                |
| `enhance.summarize_figures`              | bool                                | `true`  | Generate natural language descriptions of figures for RAG                                                                                           |
| `enhance.intelligent_ordering`           | bool                                | `false` | Use vision model to improve reading order accuracy                                                                                                  |

### retrieval group

| Parameter                          | Type                                                                                      | Default      | Description                                                                                                                                                                                                               |
| ---------------------------------- | ----------------------------------------------------------------------------------------- | ------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `retrieval.chunking.chunk_mode`    | `"disabled"` \| `"variable"` \| `"section"` \| `"page"` \| `"block"` \| `"page_sections"` | `"disabled"` | `disabled`: one chunk for entire doc. `variable`: semantic boundaries (best for RAG). `section`: split at headers. `page`: one chunk per page. `page_sections`: sections within each page.                                |
| `retrieval.chunking.chunk_size`    | int \| null                                                                               | `null`       | Target chunk size in characters. Defaults to 250-1500 range in variable mode.                                                                                                                                             |
| `retrieval.chunking.chunk_overlap` | int                                                                                       | `0`          | Characters of overlap between adjacent chunks                                                                                                                                                                             |
| `retrieval.filter_blocks`          | string\[]                                                                                 | `[]`         | Block types to exclude from `content`/`embed`. Options: `"Header"`, `"Footer"`, `"Title"`, `"Section Header"`, `"Page Number"`, `"List Item"`, `"Figure"`, `"Table"`, `"Key Value"`, `"Text"`, `"Comment"`, `"Signature"` |
| `retrieval.embedding_optimized`    | bool                                                                                      | `false`      | Optimize output for embedding models                                                                                                                                                                                      |

### formatting group

| Parameter                        | Type                                                                     | Default     | Description                                                                                                                                           |
| -------------------------------- | ------------------------------------------------------------------------ | ----------- | ----------------------------------------------------------------------------------------------------------------------------------------------------- |
| `formatting.table_output_format` | `"dynamic"` \| `"html"` \| `"md"` \| `"json"` \| `"csv"` \| `"jsonbbox"` | `"dynamic"` | `dynamic`: auto-selects md or html based on complexity. `html`: best for complex/merged cells. `md`: simple tables. `json`: programmatic cell access. |
| `formatting.add_page_markers`    | bool                                                                     | `false`     | Add page markers to output                                                                                                                            |
| `formatting.merge_tables`        | bool                                                                     | `false`     | Merge consecutive tables with same column count                                                                                                       |
| `formatting.include`             | string\[]                                                                | `[]`        | Include: `"change_tracking"`, `"highlight"`, `"comments"`, `"hyperlinks"`, `"signatures"`, `"ignore_watermarks"`                                      |

### spreadsheet group

| Parameter                                | Type                                     | Default      | Description                                                                               |
| ---------------------------------------- | ---------------------------------------- | ------------ | ----------------------------------------------------------------------------------------- |
| `spreadsheet.split_large_tables.enabled` | bool                                     | `true`       | Split large tables into smaller tables                                                    |
| `spreadsheet.split_large_tables.size`    | int                                      | `50`         | Rows per chunk for split tables                                                           |
| `spreadsheet.clustering`                 | `"accurate"` \| `"fast"` \| `"disabled"` | `"accurate"` | Algorithm for splitting sheets into tables. Accurate uses more powerful models (5x cost). |
| `spreadsheet.include`                    | string\[]                                | `[]`         | Include: `"cell_colors"`, `"formula"`, `"dropdowns"`                                      |

### settings group

| Parameter                     | Type                       | Default      | Description                                                                   |
| ----------------------------- | -------------------------- | ------------ | ----------------------------------------------------------------------------- |
| `settings.page_range`         | object \| null             | `null`       | `{"start": 1, "end": 10}` (1-indexed). Process specific pages only.           |
| `settings.return_images`      | string\[]                  | `[]`         | Return image URLs for block types: `"figure"`, `"table"`, `"page"`            |
| `settings.ocr_system`         | `"standard"` \| `"legacy"` | `"standard"` | `standard`: best multilingual OCR. `legacy`: Germanic languages only.         |
| `settings.extraction_mode`    | `"ocr"` \| `"hybrid"`      | `"hybrid"`   | `hybrid`: combines OCR with embedded PDF text (recommended). `ocr`: OCR only. |
| `settings.persist_results`    | bool                       | `false`      | Keep results indefinitely (default: expire after 24h)                         |
| `settings.force_url_result`   | bool                       | `false`      | Always return results as a URL                                                |
| `settings.timeout`            | float \| null              | `null`       | Custom timeout in seconds                                                     |
| `settings.document_password`  | string \| null             | `null`       | Password for encrypted documents                                              |
| `settings.embed_pdf_metadata` | bool                       | `false`      | Embed OCR metadata into returned PDF                                          |

### Parse Response Shape

```json theme={null}
{
  "job_id": "uuid",
  "duration": 3.89,
  "result": {
    "type": "full",           // "full" (inline) or "url" (fetch from URL)
    "chunks": [
      {
        "content": "# Heading\n\nText content...",   // Markdown-formatted
        "embed": "Heading. Text content...",          // Embedding-optimized
        "blocks": [
          {
            "type": "Title",       // Title, Section Header, Text, Table, Figure, Key Value, etc.
            "content": "Heading",
            "bbox": {"left": 0.1, "top": 0.05, "width": 0.3, "height": 0.04, "page": 1},
            "confidence": "high"   // "high" or "low"
          }
        ]
      }
    ]
  },
  "usage": {"num_pages": 3, "credits": 4.0},
  "studio_link": "https://studio.reducto.ai/job/..."
}
```

When `result.type` is `"url"`, chunks are not inline. Fetch them from the URL:

```python theme={null}
import requests

if parse_result.result.type == "url":
    chunks = requests.get(parse_result.result.url).json()
    # chunks are plain dicts when fetched via URL
else:
    chunks = parse_result.result.chunks
    # chunks are SDK objects with attribute access
```

***

## Extract Parameters

`POST /extract`. Pull specific fields from documents into structured JSON using a schema.

Extract runs Parse internally. If a value doesn't appear in the Parse output, Extract cannot extract it.

| Parameter                                 | Type                | Default                      | Description                                                                                                                                              |
| ----------------------------------------- | ------------------- | ---------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `input`                                   | string \| string\[] | **required**                 | File ID, URL, `jobid://...`, or array of job IDs to combine                                                                                              |
| `instructions.schema`                     | object              | `{}`                         | JSON Schema defining fields to extract. Field names and descriptions directly influence accuracy.                                                        |
| `instructions.system_prompt`              | string              | `"Be precise and thorough."` | Document-level context for the LLM                                                                                                                       |
| `settings.array_extract`                  | bool                | `false`                      | Segment document for long arrays. **Required** when schema has array fields in long documents. Schema must have at least one top-level `array` property. |
| `settings.deep_extract`                   | bool                | `false`                      | Agentic mode that iteratively refines output for near-perfect accuracy. Higher cost/latency.                                                             |
| `settings.citations.enabled`              | bool                | `false`                      | Return source page, bbox, and text for each value. **Mutually exclusive with chunking.**                                                                 |
| `settings.citations.numerical_confidence` | bool                | `true`                       | Include 0-1 confidence scores (vs "high"/"low")                                                                                                          |
| `settings.include_images`                 | bool                | `false`                      | Include page images in extraction context                                                                                                                |
| `settings.optimize_for_latency`           | bool                | `false`                      | Higher priority processing at 2x cost                                                                                                                    |
| `parsing`                                 | object              | `{}`                         | All Parse parameters (see above). Ignored if input is `jobid://`.                                                                                        |

### Extract Response Shape

```json theme={null}
{
  "result": [
    {
      "invoice_number": "INV-2024-001",
      "total": 1250.00,
      "line_items": [{"description": "Widget", "amount": 500.00}]
    }
  ],
  "job_id": "uuid",
  "usage": {"num_fields": 4, "num_pages": 2, "credits": 10.0},
  "studio_link": "https://studio.reducto.ai/job/..."
}
```

With `citations.enabled: true`, each value is wrapped:

```json theme={null}
{
  "result": {
    "total": {
      "value": 1250.00,
      "citations": [
        {
          "type": "Table",
          "content": "Total: $1,250.00",
          "bbox": {"left": 0.04, "top": 0.26, "width": 0.45, "height": 0.50, "page": 2},
          "confidence": "high"
        }
      ]
    }
  }
}
```

***

## Split Parameters

`POST /split`. Divide documents into named sections by page number.

Split runs Parse internally, then uses an LLM to classify pages against your section descriptions.

| Parameter               | Type                         | Default                                                | Description                                                         |
| ----------------------- | ---------------------------- | ------------------------------------------------------ | ------------------------------------------------------------------- |
| `input`                 | string                       | **required**                                           | File ID, URL, or `jobid://...`                                      |
| `split_description`     | array                        | **required**                                           | List of `{"name": "...", "description": "..."}` section definitions |
| `split_rules`           | string                       | `"Split the document into the applicable sections..."` | Natural language rules for splitting behavior                       |
| `settings.table_cutoff` | `"truncate"` \| `"preserve"` | `"truncate"`                                           | `truncate`: first rows only (faster). `preserve`: all content.      |
| `parsing`               | object                       | `{}`                                                   | All Parse parameters. Ignored if input is `jobid://`.               |

### Split Response Shape

```json theme={null}
{
  "result": {
    "splits": [
      {"name": "Executive Summary", "pages": [1, 2]},
      {"name": "Financial Statements", "pages": [3, 4, 5, 6]},
      {"name": "Risk Factors", "pages": [7, 8, 9]}
    ]
  },
  "job_id": "uuid",
  "usage": {"num_pages": 9, "credits": 6.0}
}
```

***

## Edit Parameters

`POST /edit`. Fill PDF forms and modify DOCX documents.

**Note:** Edit uses `document_url` as its input parameter, not `input` like other endpoints.

| Parameter                            | Type          | Default      | Description                                                                           |
| ------------------------------------ | ------------- | ------------ | ------------------------------------------------------------------------------------- |
| `document_url`                       | string        | **required** | File ID or URL of the document to edit (this is `document_url`, not `input`)          |
| `edit_instructions`                  | string        | **required** | Natural language instructions. Be explicit: `"Fill Name: John Doe, Date: 2024-01-15"` |
| `edit_options.color`                 | string        | `"#FF0000"`  | Highlight color for edits (DOCX only)                                                 |
| `edit_options.enable_overflow_pages` | bool          | `false`      | Create appendix pages for text exceeding field capacity (PDF only)                    |
| `form_schema`                        | array \| null | `null`       | Pre-defined field locations for repeatable form filling. Skips detection.             |

### Edit Response Shape

```json theme={null}
{
  "document_url": "https://storage.reducto.ai/filled-form.pdf?...",
  "form_schema": [
    {
      "bbox": {"left": 0.1, "top": 0.2, "width": 0.4, "height": 0.03, "page": 1},
      "description": "Name field",
      "type": "text"
    }
  ],
  "usage": {"num_pages": 2, "credits": 8}
}
```

The `document_url` is a presigned URL valid for 24 hours. Save the returned `form_schema` to reuse for the same form type (skips field detection).

***

## Classify Parameters

`POST /classify`. Categorize a document before processing.

| Parameter               | Type           | Default      | Description                                                                       |
| ----------------------- | -------------- | ------------ | --------------------------------------------------------------------------------- |
| `input`                 | string         | **required** | File ID or URL                                                                    |
| `classification_schema` | array          | **required** | List of `{"category": "...", "criteria": ["...", "..."]}`                         |
| `page_range`            | object \| null | `null`       | Pages to use for classification context. Defaults to first 5 pages. Max 10 pages. |
| `document_metadata`     | string \| null | `null`       | Optional metadata to include in classification prompt                             |

### Classify Response Shape

```json theme={null}
{
  "result": {
    "category": "invoice"
  },
  "job_id": "uuid",
  "duration": 1.23
}
```

***

## Async Processing

Most endpoints have async variants (`/parse_async`, `/extract_async`, `/split_async`, `/edit_async`). Classify is synchronous only. Async endpoints return a `job_id` immediately and process in the background.

```python theme={null}
# Submit async job
job = client.parse.run_job(input=upload.file_id)
print(job.job_id)

# Poll for results
import time
while True:
    result = client.job.get(job.job_id)
    if result.status in ("Completed", "Failed"):
        break
    time.sleep(2)
```

Configure webhooks for push-based delivery instead of polling.

***

## Error Codes

| HTTP Status | Meaning          | Common cause                                                  |
| ----------- | ---------------- | ------------------------------------------------------------- |
| 401         | Unauthorized     | Missing or invalid `REDUCTO_API_KEY`                          |
| 422         | Validation error | Invalid parameters, schema too large, or constraint violation |
| 429         | Rate limited     | Too many concurrent requests. Retry with backoff.             |
| 500         | Server error     | Transient issue. Retry with backoff.                          |

***

## Useful Links

* [API Reference](/api-reference/parse): Full OpenAPI spec with request/response details
* [Parse Configuration](/configs/overview): All configuration options
* [Cookbooks](/cookbooks/overview): End-to-end tutorials (invoice extraction, form filling, RAG)
* [Error Codes](/reference/error-codes): Complete error catalog
* [Rate Limits](/reference/rate-limits): Request limits and quotas
* [Credit Usage](/reference/credit-usage): How credits are calculated
