Skip to main content
The quality of your classification depends heavily on how you define your categories and criteria. Here are guidelines for getting the best results.

Be Specific with Criteria

Criteria should describe concrete, observable characteristics of the document, things that would be visible on the page.
Think of criteria as instructions you’d give a human reviewer: “Look for X, Y, and Z to identify this type of document.”
Good criteria describe what you’d actually see in the document:
{
  "category": "invoice",
  "criteria": [
    "contains an invoice number or reference number",
    "has line items with quantities and unit prices",
    "shows a total amount due or balance",
    "includes vendor/supplier contact information"
  ]
}
Weak criteria are too vague or abstract:
{
  "category": "invoice",
  "criteria": [
    "is an invoice",
    "looks like an invoice",
  ]
}

Make Categories Mutually Exclusive

Design your categories so that a given document clearly belongs to one category over others. If categories overlap significantly, classification accuracy will suffer.
[
  {
    "category": "declaration_page",
    "criteria": [
      "summarizes active coverage for a specific policy term",
      "includes policy number, named insured, and insurer details",
      "shows effective and expiration dates",
      "lists coverage types with limits and deductibles",
      "states total premium and billing or installment info"
    ]
  },
  {
    "category": "explanation_of_benefits",
    "criteria": [
      "explains how a submitted medical claim was processed",
      "includes member ID, claim number, and service dates",
      "breaks down billed amount, allowed amount, and insurer payment",
      "shows patient responsibility (copay, deductible, coinsurance)",
      "clearly indicates this is not a bill"
    ]
  }
]

Text vs. Image-Based Classification

Classify works with both text-heavy and visually distinct documents. Your criteria can reference either textual content or visual characteristics:
  • Text-based criteria: "contains the words 'Terms and Conditions'", "includes a table of financial figures"
  • Visual/structural criteria: "has a photo ID section", "contains handwritten notes", "includes a signature block"
For documents that are primarily distinguished by layout rather than text (e.g., a passport vs. a driver’s license), include structural criteria like "contains a machine-readable zone at the bottom" or "has a photo in the upper-left corner".

Use Enough Categories

You must provide at least two categories. Classify returns the best match from your schema, so even if none of the categories are a perfect fit, it will return the closest one. If you need an escape hatch, add a catch-all category:
{
  "category": "other",
  "criteria": [
    "does not match any of the other document types (list all the other document types)",
    "unrecognized format or content"
  ]
}

Use Confidence Scores to Refine Your Schema

The response confidence breakdown tells you exactly which criteria matched or didn’t for every category. Use this to iterate on your schema:
  1. Run Classify on a batch of sample documents.
  2. Check documents where the winning category had low confidence (e.g., below 0.7).
  3. Inspect the criteria_confidence to see which criteria are too broad, too narrow, or overlapping with other categories.
  4. Adjust your criteria and re-run until confidence scores improve.
Low-confidence classifications often point to criteria that are shared across multiple categories. Make each criterion as distinctive to its category as possible.

Classify Overview

Quick start, request parameters, and pipeline integration.

Response Format

Confidence scores, per-criterion reasoning, and all response fields.