Skip to main content
Classify unlocks conditional logic in your pipeline. You specify criteria for document classification, and Classify returns the best match from the categories you provide. While Parse reads documents, Extract pulls data out, and Split divides documents into sections, Classify tells you what kind of document you’re looking at before any of that happens. This lets you route documents to the right pipeline with the right configurations upstream of further processing.
The Classify endpoint is currently available via cURL and Python requests. Official SDK support is coming soon.

When to Use Classify

Classify routes your documents to the right pipeline upstream of further processing. Instead of parsing all of your documents with the same settings, you can contextualize your documents before you parse them. Common use cases:
  • Document triage during onboarding. Quickly classify documents into categories when users upload files on your platform. For example, sort into different types of legal documents.
  • Conditional parsing configurations. Classify handwritten doctor’s notes versus other forms, then enable agentic text in Parse with specific prompts downstream for the doctor’s notes.
  • Schema routing for extraction. Classify your documents upfront to apply different extraction schemas downstream. Passports get one schema, immigration forms get another.
  • Pipeline branching. Use classification results to decide whether a document needs Split, Extract, or both, and with which Parse configurations. For example, some financial documents may need specific splits, while others may not.
Use Classify on its own to label documents as part of a larger user flow, or apply Classify before your existing Reducto pipeline.

Quick Start

Play around at the Classify demo.
import requests

API_KEY = "YOUR_REDUCTO_API_KEY"
HEADERS = {"Authorization": f"Bearer {API_KEY}"}

# Step 1: Upload your document
upload_resp = requests.post(
    "https://platform.reducto.ai/upload",
    headers=HEADERS,
    files={"file": open("/path/to/document.pdf", "rb")},
)
file_id = upload_resp.json()["file_id"]

# Step 2: Classify it
classify_resp = requests.post(
    "https://platform.reducto.ai/classify",
    headers={**HEADERS, "Content-Type": "application/json"},
    json={
        "input": file_id,
        "classification_schema": [
            {
                "category": "invoice",
                "criteria": [
                    "contains billing information",
                    "has itemized charges",
                    "includes payment details or amounts due",
                ],
            },
            {
                "category": "contract",
                "criteria": [
                    "contains legal terms and conditions",
                    "includes signature lines or parties involved",
                    "references obligations or agreements",
                ],
            },
            {
                "category": "receipt",
                "criteria": [
                    "shows a completed transaction",
                    "includes items purchased with prices",
                    "has a total amount paid",
                ],
            },
        ],
    },
)
print(classify_resp.json())
What this does:
  1. Upload the document to get a file_id
  2. Call /classify with the file reference and a list of categories, each with criteria describing what makes a document belong to that category
  3. Get back the best-matching category
{
  "job_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "result": {
    "category": "invoice"
  }
}
Visualize your classification with the Classify demo. Classify works on all file types we can Parse.

Response Format Details

Full breakdown of confidence scores, per-criterion reasoning, and all response fields.

Request Parameters

import requests

resp = requests.post(
    "https://platform.reducto.ai/classify",
    headers={
        "Authorization": "Bearer YOUR_REDUCTO_API_KEY",
        "Content-Type": "application/json",
    },
    json={
        "input": "...",                       # Required: file_id, URL, or presigned URL
        "classification_schema": [...],       # Required: categories with criteria
        "document_metadata": "...",           # Optional: additional context
    },
)

input (required)

The document to classify. Accepts the same formats as other Reducto endpoints:
FormatExampleWhen to use
Upload responsereducto://abc123Local files uploaded via /upload
Public URLhttps://example.com/doc.pdfPublicly accessible documents
Presigned URLhttps://bucket.s3.../doc.pdf?X-Amz-...Files in your cloud storage

classification_schema (required)

A list of categories you want to classify the document into. Each category has a name and a list of criteria describing what makes a document belong to that category.
{
  "classification_schema": [
    {
      "category": "invoice",
      "criteria": [
        "contains billing information",
        "has itemized charges",
        "includes payment details"
      ]
    },
    {
      "category": "contract",
      "criteria": [
        "contains legal terms and conditions",
        "includes signature lines",
        "references parties and obligations"
      ]
    }
  ]
}
FieldTypeDescription
categorystringThe category name/label that documents will be classified into (e.g., "invoice", "contract", "receipt")
criterialist[string]A list of criteria, keywords, or descriptions that define what characteristics a document must have to be classified into this category

document_metadata

document_metadata (optional, defaults to null): A metadata string to include in classification prompts. Use this to provide additional context about the document that may help with classification.

How It Works

  1. Document ingestion. Classify accepts your document and processes it to understand its content.
  2. Category evaluation. Each category in your classification_schema is evaluated against the document. The criteria you provide guide what the model looks for.
  3. Best match selection. Classify returns the single best-matching category. It compares all categories and picks the one whose criteria best describe the document.
Classify is optimized for latency. It’s a lightweight operation compared to full parsing or extraction, designed to return results fast enough to use as an inline routing step without adding meaningful overhead to your pipeline.
Classify is synchronous only. The endpoint is optimized for low latency, so classification results return fast enough that async polling or webhooks are unnecessary.

Using Classify in a Pipeline

Classify is most useful when combined with other Reducto endpoints. Here’s a common pattern: classify first, then route to different Parse/Extract configurations based on the result.
import requests

API_KEY = "YOUR_REDUCTO_API_KEY"
HEADERS = {"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/json"}

# Step 1: Upload
upload_resp = requests.post(
    "https://platform.reducto.ai/upload",
    headers={"Authorization": f"Bearer {API_KEY}"},
    files={"file": open("document.pdf", "rb")},
)
file_id = upload_resp.json()["file_id"]

# Step 2: Classify
classify_resp = requests.post(
    "https://platform.reducto.ai/classify",
    headers=HEADERS,
    json={
        "input": file_id,
        "classification_schema": [
            {
                "category": "handwritten_notes",
                "criteria": ["contains handwritten text", "informal layout", "pen or pencil marks"],
            },
            {
                "category": "printed_form",
                "criteria": ["structured layout with fields", "typed text", "checkboxes or form widgets"],
            },
        ],
    },
)
category = classify_resp.json()["result"]["category"]

# Step 3: Route to appropriate pipeline
if category == "handwritten_notes":
    # Use agentic text mode for handwritten content
    result = requests.post(
        "https://platform.reducto.ai/parse",
        headers=HEADERS,
        json={
            "input": file_id,
            "enhance": {
                "agentic": [
                    {
                        "scope": "text",
                        "prompt": "This is a handwritten medical note. Pay close attention to medication names and dosages.",
                    }
                ]
            },
        },
    )
else:
    # Standard extraction for printed forms
    result = requests.post(
        "https://platform.reducto.ai/extract",
        headers=HEADERS,
        json={
            "input": file_id,
            "instructions": {
                "schema": {
                    "type": "object",
                    "properties": {
                        "patient_name": {"type": "string"},
                        "date_of_birth": {"type": "string"},
                        "insurance_id": {"type": "string"},
                    },
                }
            },
        },
    )

print(result.json())

FAQs

Reducto can already Classify documents after parsing using Extract or Split.Extract pulls specific data out of a document alongside an enum-based classification. Split works best when you have multi-document packets. However, both of these endpoints need Parse.Classify is utilized before Parse, and as such is optimized for latency and cost. Classify also returns structured reasoning and confidence.
Classify is faster than Parse or Extract because it doesn’t need to do full document parsing. It focuses on high-level document understanding to match against your categories.Accuracy depends on how well your criteria describe each category. Well-defined, mutually exclusive categories with specific criteria will yield the best results.
Any categories you want. You define both the category names and the criteria. Common examples include:
  • Document type: invoice, contract, receipt, tax form
  • Content characteristics: handwritten vs. typed, single-page vs. multi-page
  • Domain-specific: ACORD forms vs. declarations pages, W-2 vs. 1099
  • Processing needs: needs OCR enhancement vs. standard processing
You must provide your own categories via classification_schema. Classify matches your document against the categories you define. It doesn’t auto-discover document types.This is by design: your categories should reflect your specific pipeline needs. A healthcare company and a law firm would classify the same document differently based on their downstream processing requirements.
Yes. This is the primary use case. Classify sits at the top of your pipeline to route documents, then you call Parse, Extract, or Split with configurations tailored to each document type.See Using Classify in a Pipeline above for a complete example.
Classify is optimized for PDFs and images (PNG, JPEG, etc.), but also supports the remaining formats supported by Reducto’s Parse endpoint.
Classify always returns the best match from your schema, even if the fit isn’t perfect. If you need to handle unrecognized documents, add an "other" category with criteria like "does not match any of the other document types". Be sure to enumerate the types it should not match

Response Format

Confidence scores, per-criterion reasoning, and all response fields.

Best Practices

Write better classification schemas for more accurate results.