The Classify endpoint is currently available via cURL and Python
requests. Official SDK support is coming soon.When to Use Classify
Classify routes your documents to the right pipeline upstream of further processing. Instead of parsing all of your documents with the same settings, you can contextualize your documents before you parse them. Common use cases:- Document triage during onboarding. Quickly classify documents into categories when users upload files on your platform. For example, sort into different types of legal documents.
- Conditional parsing configurations. Classify handwritten doctor’s notes versus other forms, then enable agentic text in Parse with specific prompts downstream for the doctor’s notes.
- Schema routing for extraction. Classify your documents upfront to apply different extraction schemas downstream. Passports get one schema, immigration forms get another.
- Pipeline branching. Use classification results to decide whether a document needs Split, Extract, or both, and with which Parse configurations. For example, some financial documents may need specific splits, while others may not.
Quick Start
Play around at the Classify demo.- Upload the document to get a
file_id - Call
/classifywith the file reference and a list of categories, each with criteria describing what makes a document belong to that category - Get back the best-matching category
Response Format Details
Full breakdown of confidence scores, per-criterion reasoning, and all response fields.
Request Parameters
input (required)
The document to classify. Accepts the same formats as other Reducto endpoints:| Format | Example | When to use |
|---|---|---|
| Upload response | reducto://abc123 | Local files uploaded via /upload |
| Public URL | https://example.com/doc.pdf | Publicly accessible documents |
| Presigned URL | https://bucket.s3.../doc.pdf?X-Amz-... | Files in your cloud storage |
classification_schema (required)
A list of categories you want to classify the document into. Each category has a name and a list of criteria describing what makes a document belong to that category.| Field | Type | Description |
|---|---|---|
category | string | The category name/label that documents will be classified into (e.g., "invoice", "contract", "receipt") |
criteria | list[string] | A list of criteria, keywords, or descriptions that define what characteristics a document must have to be classified into this category |
document_metadata
document_metadata (optional, defaults to null): A metadata string to include in classification prompts. Use this to provide additional context about the document that may help with classification.
How It Works
- Document ingestion. Classify accepts your document and processes it to understand its content.
- Category evaluation. Each category in your
classification_schemais evaluated against the document. The criteria you provide guide what the model looks for. - Best match selection. Classify returns the single best-matching category. It compares all categories and picks the one whose criteria best describe the document.
Classify is synchronous only. The endpoint is optimized for low latency, so classification results return fast enough that async polling or webhooks are unnecessary.
Using Classify in a Pipeline
Classify is most useful when combined with other Reducto endpoints. Here’s a common pattern: classify first, then route to different Parse/Extract configurations based on the result.FAQs
How does Classify differ from Extract or Split?
How does Classify differ from Extract or Split?
Reducto can already Classify documents after parsing using Extract or Split.Extract pulls specific data out of a document alongside an enum-based classification. Split works best when you have multi-document packets. However, both of these endpoints need Parse.Classify is utilized before Parse, and as such is optimized for latency and cost. Classify also returns structured reasoning and confidence.
What are the latency and accuracy trade-offs?
What are the latency and accuracy trade-offs?
Classify is faster than Parse or Extract because it doesn’t need to do full document parsing. It focuses on high-level document understanding to match against your categories.Accuracy depends on how well your criteria describe each category. Well-defined, mutually exclusive categories with specific criteria will yield the best results.
What types of classifications can I specify?
What types of classifications can I specify?
Any categories you want. You define both the category names and the criteria. Common examples include:
- Document type: invoice, contract, receipt, tax form
- Content characteristics: handwritten vs. typed, single-page vs. multi-page
- Domain-specific: ACORD forms vs. declarations pages, W-2 vs. 1099
- Processing needs: needs OCR enhancement vs. standard processing
Do I need to specify categories, or can Reducto infer them?
Do I need to specify categories, or can Reducto infer them?
You must provide your own categories via
classification_schema. Classify matches your document against the categories you define. It doesn’t auto-discover document types.This is by design: your categories should reflect your specific pipeline needs. A healthcare company and a law firm would classify the same document differently based on their downstream processing requirements.Can I use Classify with Parse and Extract?
Can I use Classify with Parse and Extract?
Yes. This is the primary use case. Classify sits at the top of your pipeline to route documents, then you call Parse, Extract, or Split with configurations tailored to each document type.See Using Classify in a Pipeline above for a complete example.
What file types does Classify support?
What file types does Classify support?
Classify is optimized for PDFs and images (PNG, JPEG, etc.), but also supports the remaining formats supported by Reducto’s Parse endpoint.
What happens if my document doesn't match any category well?
What happens if my document doesn't match any category well?
Classify always returns the best match from your schema, even if the fit isn’t perfect. If you need to handle unrecognized documents, add an
"other" category with criteria like "does not match any of the other document types". Be sure to enumerate the types it should not matchRelated
Response Format
Confidence scores, per-criterion reasoning, and all response fields.
Best Practices
Write better classification schemas for more accurate results.