The quality of your classification depends heavily on how you define your categories and criteria. Here are guidelines for getting the best results.Documentation Index
Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
Use this file to discover all available pages before exploring further.
Be Specific with Criteria
Criteria should describe concrete, observable characteristics of the document, things that would be visible on the page. Good criteria describe what you’d actually see in the document:Make Categories Mutually Exclusive
Design your categories so that a given document clearly belongs to one category over others. If categories overlap significantly, classification accuracy will suffer.Text vs. Image-Based Classification
Classify works with both text-heavy and visually distinct documents. Your criteria can reference either textual content or visual characteristics:- Text-based criteria:
"contains the words 'Terms and Conditions'","includes a table of financial figures" - Visual/structural criteria:
"has a photo ID section","contains handwritten notes","includes a signature block"
"contains a machine-readable zone at the bottom" or "has a photo in the upper-left corner".
Use Enough Categories
You must provide at least two categories. Classify returns the best match from your schema, so even if none of the categories are a perfect fit, it will return the closest one. If you need an escape hatch, add a catch-all category:Use Confidence Scores to Refine Your Schema
The response confidence breakdown tells you exactly which criteria matched or didn’t for every category. Use this to iterate on your schema:- Run Classify on a batch of sample documents.
- Check documents where the winning category had low confidence (e.g., below
0.7). - Inspect the
criteria_confidenceto see which criteria are too broad, too narrow, or overlapping with other categories. - Adjust your criteria and re-run until confidence scores improve.
Related
Classify Overview
Quick start, request parameters, and pipeline integration.
Response Format
Confidence scores, per-criterion reasoning, and all response fields.