> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Classify

> Categorize documents before you parse

Classify unlocks conditional logic in your pipeline. You specify criteria for document classification, and Classify returns the best match from the categories you provide.

While [Parse](/parse/overview) reads documents, [Extract](/extract/overview) pulls data out, and [Split](/split) divides documents into sections, Classify tells you **what kind of document you're looking at** before any of that happens. This lets you route documents to the right pipeline with the right configurations upstream of further processing.

***

## When to Use Classify

Classify routes your documents to the right pipeline upstream of further processing. Instead of parsing all of your documents with the same settings, you can contextualize your documents before you parse them.

**Common use cases:**

* **Document triage during onboarding.** Quickly classify documents into categories when users upload files on your platform. For example, sort into different types of legal documents.
* **Conditional parsing configurations.** Classify handwritten doctor's notes versus other forms, then enable agentic text in Parse with specific prompts downstream for the doctor's notes.
* **Schema routing for extraction.** Classify your documents upfront to apply different extraction schemas downstream. Passports get one schema, immigration forms get another.
* **Pipeline branching.** Use classification results to decide whether a document needs Split, Extract, or both, and with which Parse configurations. For example, some financial documents may need specific splits, while others may not.

Use Classify on its own to label documents as part of a larger user flow, or apply Classify before your existing Reducto pipeline.

***

## Quick Start

Play around at [the Classify demo](https://classify.reducto.ai).

<CodeGroup>
  ```python Python theme={null}
  from reducto import Reducto

  client = Reducto()

  response = client.classify.run(
      input="https://example.com/document.pdf",
      classification_schema=[
          {
              "category": "invoice",
              "criteria": [
                  "contains billing information",
                  "has itemized charges",
                  "includes payment details or amounts due",
              ],
          },
          {
              "category": "contract",
              "criteria": [
                  "contains legal terms and conditions",
                  "includes signature lines or parties involved",
                  "references obligations or agreements",
              ],
          },
          {
              "category": "receipt",
              "criteria": [
                  "shows a completed transaction",
                  "includes items purchased with prices",
                  "has a total amount paid",
              ],
          },
      ],
  )
  print(response)
  ```

  ```javascript Node.js theme={null}
  import Reducto from 'reductoai';

  const client = new Reducto();

  const response = await client.classify.run({
    input: 'https://example.com/document.pdf',
    classification_schema: [
      {
        category: 'invoice',
        criteria: [
          'contains billing information',
          'has itemized charges',
          'includes payment details or amounts due',
        ],
      },
      {
        category: 'contract',
        criteria: [
          'contains legal terms and conditions',
          'includes signature lines or parties involved',
          'references obligations or agreements',
        ],
      },
      {
        category: 'receipt',
        criteria: [
          'shows a completed transaction',
          'includes items purchased with prices',
          'has a total amount paid',
        ],
      },
    ],
  });
  console.log(response);
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/classify \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "https://example.com/document.pdf",
      "classification_schema": [
        {
          "category": "invoice",
          "criteria": [
            "contains billing information",
            "has itemized charges",
            "includes payment details or amounts due"
          ]
        },
        {
          "category": "contract",
          "criteria": [
            "contains legal terms and conditions",
            "includes signature lines or parties involved",
            "references obligations or agreements"
          ]
        },
        {
          "category": "receipt",
          "criteria": [
            "shows a completed transaction",
            "includes items purchased with prices",
            "has a total amount paid"
          ]
        }
      ]
    }'
  ```
</CodeGroup>

**What this does:**

1. **Upload** the document to get a `file_id`
2. **Call `/classify`** with the file reference and a list of categories, each with criteria describing what makes a document belong to that category
3. **Get back** the best-matching category

```json theme={null}
{
  "job_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "result": {
    "category": "invoice"
  }
}
```

Visualize your classification with [the Classify demo](https://classify.reducto.ai). Classify works on all file types we can Parse.

<Card title="Response Format Details" icon="brackets-curly" href="/classify/response-format">
  Full breakdown of confidence scores, per-criterion reasoning, and all response fields.
</Card>

***

## Request Parameters

<CodeGroup>
  ```python Python theme={null}
  from reducto import Reducto

  client = Reducto()

  response = client.classify.run(
      input="...",                      # Required: upload response or URL
      classification_schema=[...],       # Required: categories with criteria
      page_range={"start": 1, "end": 10},  # Optional: pages to use as context
      document_metadata="...",           # Optional: additional context
  )
  ```

  ```javascript Node.js theme={null}
  import Reducto from 'reductoai';

  const client = new Reducto();

  const response = await client.classify.run({
    input: '...',                        // Required: upload response or URL
    classification_schema: [...],        // Required: categories with criteria
    page_range: { start: 1, end: 10 },   // Optional: pages to use as context
    document_metadata: '...',            // Optional: additional context
  });
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/classify \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "YOUR_FILE_ID_OR_URL",
      "classification_schema": [{"category": "...", "criteria": ["..."]}],
      "page_range": {"start": 1, "end": 10},
      "document_metadata": "optional context string"
    }'
  ```
</CodeGroup>

### input (required)

The document to classify. Accepts the same formats as other Reducto endpoints:

| Format          | Example                                  | When to use                        |
| --------------- | ---------------------------------------- | ---------------------------------- |
| Upload response | `reducto://abc123`                       | Local files uploaded via `/upload` |
| Public URL      | `https://example.com/doc.pdf`            | Publicly accessible documents      |
| Presigned URL   | `https://bucket.s3.../doc.pdf?X-Amz-...` | Files in your cloud storage        |

### classification\_schema (required)

A list of categories you want to classify the document into. Each category has a name and a list of criteria describing what makes a document belong to that category.

```json theme={null}
{
  "classification_schema": [
    {
      "category": "invoice",
      "criteria": [
        "contains billing information",
        "has itemized charges",
        "includes payment details"
      ]
    },
    {
      "category": "contract",
      "criteria": [
        "contains legal terms and conditions",
        "includes signature lines",
        "references parties and obligations"
      ]
    }
  ]
}
```

| Field      | Type           | Description                                                                                                                             |
| ---------- | -------------- | --------------------------------------------------------------------------------------------------------------------------------------- |
| `category` | `string`       | The category name/label that documents will be classified into (e.g., `"invoice"`, `"contract"`, `"receipt"`)                           |
| `criteria` | `list[string]` | A list of criteria, keywords, or descriptions that define what characteristics a document must have to be classified into this category |

### page\_range

`page_range` (optional, defaults to first 5 pages): The page range to use as context for classification. Accepts an object with `start` and `end` fields (1-indexed, inclusive). If more than 10 pages are selected, the request returns an error. Only applies to PDFs.

See [Classify Configuration](/configs/classify/configuration) for examples and cost implications.

### document\_metadata

`document_metadata` (optional, defaults to `null`): A metadata string to include in classification prompts. Use this to provide additional context about the document that may help with classification.

***

## How It Works

1. **Document ingestion.** Classify accepts your document and processes it to understand its content.
2. **Category evaluation.** Each category in your `classification_schema` is evaluated against the document. The criteria you provide guide what the model looks for.
3. **Best match selection.** Classify returns the single best-matching category. It compares all categories and picks the one whose criteria best describe the document.

Classify is optimized for latency. It's a lightweight operation compared to full parsing or extraction, designed to return results fast enough to use as an inline routing step without adding meaningful overhead to your pipeline.

<Note>
  Classify is synchronous only. The endpoint is optimized for low latency, so classification results return fast enough that async polling or webhooks are unnecessary.
</Note>

***

## Using Classify in a Pipeline

Classify is most useful when combined with other Reducto endpoints. Here's a common pattern: classify first, then route to different Parse/Extract configurations based on the result.

<CodeGroup>
  ```python Python theme={null}
  from pathlib import Path
  from reducto import Reducto

  client = Reducto()

  # Step 1: Upload
  upload = client.upload(file=Path("document.pdf"))

  # Step 2: Classify
  classification = client.classify.run(
      input=upload.file_id,
      classification_schema=[
          {
              "category": "handwritten_notes",
              "criteria": ["contains handwritten text", "informal layout", "pen or pencil marks"],
          },
          {
              "category": "printed_form",
              "criteria": ["structured layout with fields", "typed text", "checkboxes or form widgets"],
          },
      ],
  )
  category = classification.result.category

  # Step 3: Route to appropriate pipeline
  if category == "handwritten_notes":
      result = client.parse.run(
          input=upload.file_id,
          enhance={"agentic": [{"scope": "text", "prompt": "This is a handwritten medical note. Pay close attention to medication names and dosages."}]},
      )
  else:
      # Standard extraction for printed forms
      result = client.extract.run(
          input=upload.file_id,
          instructions={
              "schema": {
                  "type": "object",
                  "properties": {
                      "patient_name": {"type": "string"},
                      "date_of_birth": {"type": "string"},
                      "insurance_id": {"type": "string"},
                  },
              }
          },
      )

  print(result)
  ```

  ```javascript Node.js theme={null}
  import fs from 'fs';
  import Reducto from 'reductoai';

  const client = new Reducto();

  // Step 1: Upload
  const upload = await client.upload({ file: fs.createReadStream('document.pdf') });

  // Step 2: Classify
  const classification = await client.classify.run({
    input: upload.file_id,
    classification_schema: [
      {
        category: 'handwritten_notes',
        criteria: ['contains handwritten text', 'informal layout', 'pen or pencil marks'],
      },
      {
        category: 'printed_form',
        criteria: ['structured layout with fields', 'typed text', 'checkboxes or form widgets'],
      },
    ],
  });
  const category = classification.result.category;

  // Step 3: Route to appropriate pipeline
  if (category === 'handwritten_notes') {
    const result = await client.parse.run({
      input: upload.file_id,
      enhance: { agentic: [{ scope: 'text', prompt: 'This is a handwritten medical note. Pay close attention to medication names and dosages.' }] },
    });
    console.log(result);
  } else {
    // Standard extraction for printed forms
    const result = await client.extract.run({
      input: upload.file_id,
      instructions: {
        schema: {
          type: 'object',
          properties: {
            patient_name: { type: 'string' },
            date_of_birth: { type: 'string' },
            insurance_id: { type: 'string' },
          },
        },
      },
    });
    console.log(result);
  }
  ```

  ```bash cURL theme={null}
  # Step 1: Upload
  FILE_ID=$(curl -s -X POST https://platform.reducto.ai/upload \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -F "file=@document.pdf" | jq -r '.file_id')

  # Step 2: Classify
  CATEGORY=$(curl -s -X POST https://platform.reducto.ai/classify \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "'$FILE_ID'",
      "classification_schema": [
        {
          "category": "handwritten_notes",
          "criteria": ["contains handwritten text", "informal layout", "pen or pencil marks"]
        },
        {
          "category": "printed_form",
          "criteria": ["structured layout with fields", "typed text", "checkboxes or form widgets"]
        }
      ]
    }' | jq -r '.result.category')

  # Step 3: Route to appropriate pipeline
  if [ "$CATEGORY" = "handwritten_notes" ]; then
    curl -X POST https://platform.reducto.ai/parse \
      -H "Authorization: Bearer $REDUCTO_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "input": "'$FILE_ID'",
        "enhance": {
          "agentic": [{"scope": "text", "prompt": "This is a handwritten medical note. Pay close attention to medication names and dosages. "}]
        }
      }'
  else
    curl -X POST https://platform.reducto.ai/extract \
      -H "Authorization: Bearer $REDUCTO_API_KEY" \
      -H "Content-Type: application/json" \
      -d '{
        "input": "'$FILE_ID'",
        "instructions": {
          "schema": {
            "type": "object",
            "properties": {
              "patient_name": {"type": "string"},
              "date_of_birth": {"type": "string"},
              "insurance_id": {"type": "string"}
            }
          }
        }
      }'
  fi
  ```
</CodeGroup>

***

## FAQs

<AccordionGroup>
  <Accordion title="How does Classify differ from Extract or Split?">
    Reducto can already Classify documents after parsing using Extract or Split.

    Extract pulls **specific data** out of a document alongside an enum-based classification. Split works best when you have multi-document packets. However, both of these endpoints need Parse.

    Classify is utilized **before** Parse, and as such is optimized for latency and cost. Classify also returns structured reasoning and confidence.
  </Accordion>

  <Accordion title="What are the latency and accuracy trade-offs?">
    Classify is faster than Parse or Extract because it doesn't need to do full document parsing. It focuses on high-level document understanding to match against your categories.

    Accuracy depends on how well your criteria describe each category. Well-defined, mutually exclusive categories with specific criteria will yield the best results.
  </Accordion>

  <Accordion title="What types of classifications can I specify?">
    Any categories you want. You define both the category names and the criteria. Common examples include:

    * **Document type**: invoice, contract, receipt, tax form
    * **Content characteristics**: handwritten vs. typed, single-page vs. multi-page
    * **Domain-specific**: ACORD forms vs. declarations pages, W-2 vs. 1099
    * **Processing needs**: needs OCR enhancement vs. standard processing
  </Accordion>

  <Accordion title="Do I need to specify categories, or can Reducto infer them?">
    You must provide your own categories via `classification_schema`. Classify matches your document against the categories you define. It doesn't auto-discover document types.

    This is by design: your categories should reflect your specific pipeline needs. A healthcare company and a law firm would classify the same document differently based on their downstream processing requirements.
  </Accordion>

  <Accordion title="Can I use Classify with Parse and Extract?">
    Yes. This is the primary use case. Classify sits at the top of your pipeline to route documents, then you call Parse, Extract, or Split with configurations tailored to each document type.

    See [Using Classify in a Pipeline](#using-classify-in-a-pipeline) above for a complete example.
  </Accordion>

  <Accordion title="What file types does Classify support?">
    Classify is optimized for PDFs and images (PNG, JPEG, etc.), but also supports the remaining formats supported by Reducto's Parse endpoint.
  </Accordion>

  <Accordion title="What happens if my document doesn't match any category well?">
    Classify always returns the best match from your schema, even if the fit isn't perfect. If you need to handle unrecognized documents, add an `"other"` category with criteria like `"does not match any of the other document types"`. Be sure to enumerate the types it should not match
  </Accordion>
</AccordionGroup>

***

## Related

<CardGroup cols={2}>
  <Card title="Response Format" icon="brackets-curly" href="/classify/response-format">
    Confidence scores, per-criterion reasoning, and all response fields.
  </Card>

  <Card title="Best Practices" icon="gauge-high" href="/classify/best-practices">
    Write better classification schemas for more accurate results.
  </Card>

  <Card title="Classify Configuration" icon="gear" href="/configs/classify/configuration">
    Page ranges and classification schema options.
  </Card>

  <Card title="Credit Usage" icon="coins" href="/reference/credit-usage">
    How Classify credits are calculated.
  </Card>
</CardGroup>
