Confidence scores, per-criterion reasoning, and all response fields
Classify returns a category label, a per-criterion confidence breakdown for every category in your schema, and timing information. This page explains every field in the response.
Classify doesn’t just return a label. It returns a per-criterion confidence breakdown for every category you defined, not just the winner. This gives you structured, interpretable reasoning for why a document was classified the way it was.Each criterion you define becomes a yes/no evaluation. The confidence score for a category is the fraction of its criteria that matched (high). In the example above, "invoice" scored 1.0 because all 3 criteria matched, while "contract" scored 0.33 because only 1 of 3 criteria matched.This structured output is useful in several ways:
Auditability. You can trace exactly which criteria drove a classification decision. If an invoice was misclassified, inspect the criteria_confidence to see which criteria matched or didn’t.
Threshold-based routing. Instead of blindly trusting result.category, check the confidence score. If the top category scores below a threshold (e.g., 0.6), flag it for human review rather than routing it automatically.
Ambiguity detection. If two categories score similarly (e.g., 0.67 and 0.55), the document may be ambiguous. Use this signal to trigger a different workflow or request additional information.
Schema refinement. Low-confidence classifications across your pipeline tell you which criteria need to be more specific. The per-criterion breakdown pinpoints exactly which criteria are too broad or overlapping.
Think of criteria as a structured checklist. This makes the classification decision transparent and programmatically accessible, not just a black-box label.