Classify Configuration

Classify accepts a document and a list of categories, then returns the best match. This page covers the configuration options that control how classification works.

Page Range

By default, Classify uses the first 5 pages of a document as context for classification. For most documents, the first few pages contain enough information to determine document type (cover pages, headers, introductory sections). You can increase context up to 10 pages using the page_range parameter when distinguishing content appears deeper in the document.

from reducto import Reducto

client = Reducto()

response = client.classify.run(
    input="https://example.com/document.pdf",
    page_range={"start": 1, "end": 10},
    classification_schema=[
        {
            "category": "annual_report",
            "criteria": ["financial statements", "shareholder letter", "auditor's report"],
        },
        {
            "category": "quarterly_filing",
            "criteria": ["quarterly results", "interim statements"],
        },
    ],
)

Page numbers are 1-indexed (first page is page 1).
Both start and end are inclusive.
If no page_range is specified, the first 5 pages are used.
If more than 10 pages are selected, the request returns an error.
Only applies to PDFs. Ignored for other document types.

Each page of context costs 0.5 credits. Using the default 5 pages costs 2.5 credits per classification. Increasing to 10 pages costs 5.0 credits. Only increase when the default pages don’t contain enough distinguishing content. See Credit Usage for details.

Classification Schema

The classification_schema parameter defines what categories Classify can return. Each category needs a name and a list of criteria.

Writing effective criteria

Criteria are natural language descriptions that tell the model what to look for. More specific criteria produce better results. Good criteria describe observable features:

“Contains a table of itemized charges with quantities and unit prices”
“Includes signature blocks for multiple parties”
“Has a header with ‘INVOICE’ or invoice number”

Weak criteria are too generic:

“Business document”
“Has text”
“Contains information”

Example: Financial document routing

response = client.classify.run(
    input=upload.file_id,
    classification_schema=[
        {
            "category": "invoice",
            "criteria": [
                "itemized list of charges or line items",
                "total amount due",
                "billing and payment information",
                "vendor or supplier details",
            ],
        },
        {
            "category": "bank_statement",
            "criteria": [
                "account balance and transaction history",
                "deposits and withdrawals listed by date",
                "bank name and account number",
            ],
        },
        {
            "category": "tax_form",
            "criteria": [
                "tax identification numbers (SSN, EIN)",
                "income and deduction categories",
                "IRS form number (W-2, 1099, 1040)",
            ],
        },
        {
            "category": "receipt",
            "criteria": [
                "single transaction with date and amount",
                "store or merchant name",
                "payment method (cash, card, etc.)",
            ],
        },
    ],
)

Classify Overview

Introduction to document classification.

Chaining API Calls

Route classified documents to Parse and Extract.

Credit Usage

Classification pricing details.

​Page Range

​Classification Schema

​Writing effective criteria

​Example: Financial document routing

​Related

Classify Overview

Chaining API Calls

Credit Usage

Page Range

Classification Schema

Writing effective criteria

Example: Financial document routing

Related