Classify

Classify determines what kind of document you are looking at before any downstream processing begins. You specify categories with natural language criteria, and Reducto returns the best match. Classify is typically the first step in a document workflow. It routes documents to the right pipeline with the right configurations, so that Parse, Extract, and Split each run with settings tuned for the specific document type.

When to Use Classify

Classify routes your documents to the right pipeline upstream of further processing. Instead of parsing all of your documents with the same settings, you can contextualize your documents before you parse them. Common use cases:

Document triage during onboarding. Quickly classify documents into categories when users upload files on your platform. For example, sort into different types of legal documents.
Conditional parsing configurations. Classify handwritten doctor’s notes versus other forms, then enable agentic text in Parse with specific prompts downstream for the doctor’s notes.
Schema routing for extraction. Classify your documents upfront to apply different extraction schemas downstream. Passports get one schema, immigration forms get another.
Pipeline branching. Use classification results to decide whether a document needs Split, Extract, or both, and with which Parse configurations. For example, some financial documents may need specific splits, while others may not.

Use Classify on its own to label documents as part of a larger user flow, or apply Classify before your existing Reducto pipeline.

Quick Start

Play around at the Classify demo.

from reducto import Reducto

client = Reducto()

response = client.classify.run(
    input="https://example.com/document.pdf",
    classification_schema=[
        {
            "category": "invoice",
            "criteria": [
                "contains billing information",
                "has itemized charges",
                "includes payment details or amounts due",
            ],
        },
        {
            "category": "contract",
            "criteria": [
                "contains legal terms and conditions",
                "includes signature lines or parties involved",
                "references obligations or agreements",
            ],
        },
        {
            "category": "receipt",
            "criteria": [
                "shows a completed transaction",
                "includes items purchased with prices",
                "has a total amount paid",
            ],
        },
    ],
)
print(response)

import Reducto from 'reductoai';

const client = new Reducto();

const response = await client.classify.run({
  input: 'https://example.com/document.pdf',
  classification_schema: [
    {
      category: 'invoice',
      criteria: [
        'contains billing information',
        'has itemized charges',
        'includes payment details or amounts due',
      ],
    },
    {
      category: 'contract',
      criteria: [
        'contains legal terms and conditions',
        'includes signature lines or parties involved',
        'references obligations or agreements',
      ],
    },
    {
      category: 'receipt',
      criteria: [
        'shows a completed transaction',
        'includes items purchased with prices',
        'has a total amount paid',
      ],
    },
  ],
});
console.log(response);

curl -X POST https://platform.reducto.ai/classify \
  -H "Authorization: Bearer $REDUCTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "https://example.com/document.pdf",
    "classification_schema": [
      {
        "category": "invoice",
        "criteria": [
          "contains billing information",
          "has itemized charges",
          "includes payment details or amounts due"
        ]
      },
      {
        "category": "contract",
        "criteria": [
          "contains legal terms and conditions",
          "includes signature lines or parties involved",
          "references obligations or agreements"
        ]
      },
      {
        "category": "receipt",
        "criteria": [
          "shows a completed transaction",
          "includes items purchased with prices",
          "has a total amount paid"
        ]
      }
    ]
  }'

What this does:

Upload the document to get a file_id
Call /classify with the file reference and a list of categories, each with criteria describing what makes a document belong to that category
Get back the best-matching category

{
  "job_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "result": {
    "category": "invoice"
  },
  "usage": {
    "num_pages": 5,
    "num_categories": 3,
    "credits": 2.5
  }
}

Visualize your classification with the Classify demo. Classify works on all file types we can Parse.

Response Format Details

Full breakdown of confidence scores, per-criterion reasoning, and all response fields.

Request Parameters

from reducto import Reducto

client = Reducto()

response = client.classify.run(
    input="...",                      # Required: upload response or URL
    classification_schema=[...],       # Required: categories with criteria
    page_range={"start": 1, "end": 10},  # Optional: pages to use as context
    document_metadata="...",           # Optional: additional context
)

import Reducto from 'reductoai';

const client = new Reducto();

const response = await client.classify.run({
  input: '...',                        // Required: upload response or URL
  classification_schema: [...],        // Required: categories with criteria
  page_range: { start: 1, end: 10 },   // Optional: pages to use as context
  document_metadata: '...',            // Optional: additional context
});

curl -X POST https://platform.reducto.ai/classify \
  -H "Authorization: Bearer $REDUCTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "YOUR_FILE_ID_OR_URL",
    "classification_schema": [{"category": "...", "criteria": ["..."]}],
    "page_range": {"start": 1, "end": 10},
    "document_metadata": "optional context string"
  }'

input (required)

The document to classify. Accepts the same formats as other Reducto endpoints:

Format	Example	When to use
Upload response	`reducto://abc123`	Local files uploaded via `/upload`
Public URL	`https://example.com/doc.pdf`	Publicly accessible documents
Presigned URL	`https://bucket.s3.../doc.pdf?X-Amz-...`	Files in your cloud storage

classification_schema (required)

A list of categories you want to classify the document into. Each category has a name and a list of criteria describing what makes a document belong to that category.

{
  "classification_schema": [
    {
      "category": "invoice",
      "criteria": [
        "contains billing information",
        "has itemized charges",
        "includes payment details"
      ]
    },
    {
      "category": "contract",
      "criteria": [
        "contains legal terms and conditions",
        "includes signature lines",
        "references parties and obligations"
      ]
    }
  ]
}

Field	Type	Description
`category`	`string`	The category name/label that documents will be classified into (e.g., `"invoice"`, `"contract"`, `"receipt"`)
`criteria`	`list[string]`	A list of criteria, keywords, or descriptions that define what characteristics a document must have to be classified into this category

page_range

page_range (optional, defaults to first 5 pages): The page range to use as context for classification. Accepts an object with start and end fields (1-indexed, inclusive). If more than 10 pages are selected, the request returns an error. Only applies to PDFs. See Classify Configuration for examples and cost implications.

document_metadata

document_metadata (optional, defaults to null): A metadata string to include in classification prompts. Use this to provide additional context about the document that may help with classification.

How It Works

Document ingestion. Classify accepts your document and processes it to understand its content.
Category evaluation. Each category in your classification_schema is evaluated against the document. The criteria you provide guide what the model looks for.
Best match selection. Classify returns the single best-matching category. It compares all categories and picks the one whose criteria best describe the document.

Classify is optimized for latency. It’s a lightweight operation compared to full parsing or extraction, designed to return results fast enough to use as an inline routing step without adding meaningful overhead to your pipeline.

Classify is synchronous only. The endpoint is optimized for low latency, so classification results return fast enough that async polling or webhooks are unnecessary.

Using Classify in a Pipeline

Classify is most useful when combined with other Reducto endpoints. Here’s a common pattern: classify first, then route to different Parse/Extract configurations based on the result.

from pathlib import Path
from reducto import Reducto

client = Reducto()

# Step 1: Upload
upload = client.upload(file=Path("document.pdf"))

# Step 2: Classify
classification = client.classify.run(
    input=upload.file_id,
    classification_schema=[
        {
            "category": "handwritten_notes",
            "criteria": ["contains handwritten text", "informal layout", "pen or pencil marks"],
        },
        {
            "category": "printed_form",
            "criteria": ["structured layout with fields", "typed text", "checkboxes or form widgets"],
        },
    ],
)
category = classification.result.category

# Step 3: Route to appropriate pipeline
if category == "handwritten_notes":
    result = client.parse.run(
        input=upload.file_id,
        enhance={"agentic": [{"scope": "text", "prompt": "This is a handwritten medical note. Pay close attention to medication names and dosages."}]},
    )
else:
    # Standard extraction for printed forms
    result = client.extract.run(
        input=upload.file_id,
        instructions={
            "schema": {
                "type": "object",
                "properties": {
                    "patient_name": {"type": "string"},
                    "date_of_birth": {"type": "string"},
                    "insurance_id": {"type": "string"},
                },
            }
        },
    )

print(result)

import fs from 'fs';
import Reducto from 'reductoai';

const client = new Reducto();

// Step 1: Upload
const upload = await client.upload({ file: fs.createReadStream('document.pdf') });

// Step 2: Classify
const classification = await client.classify.run({
  input: upload.file_id,
  classification_schema: [
    {
      category: 'handwritten_notes',
      criteria: ['contains handwritten text', 'informal layout', 'pen or pencil marks'],
    },
    {
      category: 'printed_form',
      criteria: ['structured layout with fields', 'typed text', 'checkboxes or form widgets'],
    },
  ],
});
const category = classification.result.category;

// Step 3: Route to appropriate pipeline
if (category === 'handwritten_notes') {
  const result = await client.parse.run({
    input: upload.file_id,
    enhance: { agentic: [{ scope: 'text', prompt: 'This is a handwritten medical note. Pay close attention to medication names and dosages.' }] },
  });
  console.log(result);
} else {
  // Standard extraction for printed forms
  const result = await client.extract.run({
    input: upload.file_id,
    instructions: {
      schema: {
        type: 'object',
        properties: {
          patient_name: { type: 'string' },
          date_of_birth: { type: 'string' },
          insurance_id: { type: 'string' },
        },
      },
    },
  });
  console.log(result);
}

# Step 1: Upload
FILE_ID=$(curl -s -X POST https://platform.reducto.ai/upload \
  -H "Authorization: Bearer $REDUCTO_API_KEY" \
  -F "file=@document.pdf" | jq -r '.file_id')

# Step 2: Classify
CATEGORY=$(curl -s -X POST https://platform.reducto.ai/classify \
  -H "Authorization: Bearer $REDUCTO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "'$FILE_ID'",
    "classification_schema": [
      {
        "category": "handwritten_notes",
        "criteria": ["contains handwritten text", "informal layout", "pen or pencil marks"]
      },
      {
        "category": "printed_form",
        "criteria": ["structured layout with fields", "typed text", "checkboxes or form widgets"]
      }
    ]
  }' | jq -r '.result.category')

# Step 3: Route to appropriate pipeline
if [ "$CATEGORY" = "handwritten_notes" ]; then
  curl -X POST https://platform.reducto.ai/parse \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "'$FILE_ID'",
      "enhance": {
        "agentic": [{"scope": "text", "prompt": "This is a handwritten medical note. Pay close attention to medication names and dosages. "}]
      }
    }'
else
  curl -X POST https://platform.reducto.ai/extract \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "'$FILE_ID'",
      "instructions": {
        "schema": {
          "type": "object",
          "properties": {
            "patient_name": {"type": "string"},
            "date_of_birth": {"type": "string"},
            "insurance_id": {"type": "string"}
          }
        }
      }
    }'
fi

FAQs

How does Classify differ from Extract or Split?

Reducto can already Classify documents after parsing using Extract or Split.Extract pulls specific data out of a document alongside an enum-based classification. Split works best when you have multi-document packets. However, both of these endpoints need Parse.Classify is utilized before Parse, and as such is optimized for latency and cost. Classify also returns structured reasoning and confidence.

What are the latency and accuracy trade-offs?

Classify is faster than Parse or Extract because it doesn’t need to do full document parsing. It focuses on high-level document understanding to match against your categories.Accuracy depends on how well your criteria describe each category. Well-defined, mutually exclusive categories with specific criteria will yield the best results.

What types of classifications can I specify?

Any categories you want. You define both the category names and the criteria. Common examples include:

Document type: invoice, contract, receipt, tax form
Content characteristics: handwritten vs. typed, single-page vs. multi-page
Domain-specific: ACORD forms vs. declarations pages, W-2 vs. 1099
Processing needs: needs OCR enhancement vs. standard processing

Do I need to specify categories, or can Reducto infer them?

You must provide your own categories via classification_schema. Classify matches your document against the categories you define. It doesn’t auto-discover document types.This is by design: your categories should reflect your specific pipeline needs. A healthcare company and a law firm would classify the same document differently based on their downstream processing requirements.

Can I use Classify with Parse and Extract?

Yes. This is the primary use case. Classify sits at the top of your pipeline to route documents, then you call Parse, Extract, or Split with configurations tailored to each document type.See Using Classify in a Pipeline above for a complete example.

What file types does Classify support?

Classify is optimized for PDFs and images (PNG, JPEG, etc.), but also supports the remaining formats supported by Reducto’s Parse endpoint.

What happens if my document doesn't match any category well?

Classify always returns the best match from your schema, even if the fit isn’t perfect. If you need to handle unrecognized documents, add an "other" category with criteria like "does not match any of the other document types". Be sure to enumerate the types it should not match

Response Format

Confidence scores, per-criterion reasoning, and all response fields.

Best Practices

Write better classification schemas for more accurate results.

Classify Configuration

Page ranges and classification schema options.

Credit Usage

How Classify credits are calculated.

Get Started

Developer Tools

Core Functions

Workflows and Pipelines

Configurations

Reference

Components

Enterprise Resources

Security and privacy

On-premise Resources

When to Use Classify

Quick Start

Response Format Details

Request Parameters

input (required)

classification_schema (required)

page_range

document_metadata

How It Works

Using Classify in a Pipeline

FAQs

Response Format

Best Practices

Classify Configuration

Credit Usage

​When to Use Classify

​Quick Start

Response Format Details

​Request Parameters

​input (required)

​classification_schema (required)

​page_range

​document_metadata

​How It Works

​Using Classify in a Pipeline

​FAQs

​Related

Response Format

Best Practices

Classify Configuration

Credit Usage

When to Use Classify

Quick Start

Request Parameters

input (required)

classification_schema (required)

page_range

document_metadata

How It Works

Using Classify in a Pipeline

FAQs

Related