Skip to main content
Citations tell you exactly where each extracted value came from in the document. When enabled, every field includes bounding box coordinates pointing to the source text. This matters for:
  • Verification: Confirm extractions are correct by checking source text
  • Compliance: Maintain audit trails for regulated workflows
  • Debugging: See where the model looked when values are wrong
  • User experience: Let users click from extracted data to the original location
result = client.extract.run(
    input=upload,
    instructions={"schema": schema},
    settings={
        "citations": {"enabled": True}
    }
)

Response Structure

With citations enabled, each value becomes an object with value and citations:
{
  "result": {
    "invoice_total": {
      "value": 1575.00,
      "citations": [
        {
          "type": "Table",
          "content": "Total Due: $1,575.00",
          "bbox": {
            "left": 0.65,
            "top": 0.82,
            "width": 0.25,
            "height": 0.03,
            "page": 1,
            "original_page": 1
          },
          "confidence": "high",
          "granular_confidence": {
            "extract_confidence": 0.95,
            "parse_confidence": 0.91
          },
          "parentBlock": {
            "type": "Table",
            "content": "Subtotal: $1,500.00\nTax: $75.00\nTotal Due: $1,575.00",
            "bbox": {"left": 0.60, "top": 0.75, "width": 0.35, "height": 0.12, "page": 1}
          }
        }
      ]
    }
  }
}
Fields:
  • type: Block type (Text, Table, Key Value, etc.)
  • content: The source text
  • bbox: Bounding box coordinates (normalized 0-1 for PDFs/images)
  • confidence: "high" or "low"
  • granular_confidence: Numeric scores (extract_confidence, parse_confidence) between 0-1
  • parentBlock: The larger Parse block containing this citation, for context

Working with Citations

Accessing a scalar field:
invoice_number = result.result["invoice_number"]
print(f"Value: {invoice_number.value}")

if invoice_number.citations:
    citation = invoice_number.citations[0]
    print(f"Found on page {citation.bbox.page}")
    print(f"Source text: {citation.content}")
Looping through array items:
for item in result.result["line_items"]:
    amount = item["amount"]
    print(f"${amount.value}")
    
    if amount.citations:
        print(f"  Found on page {amount.citations[0].bbox.page}")

Bounding Box Coordinates

For PDFs and images, coordinates are normalized to [0, 1] relative to page dimensions. left: 0.5 means halfway across the page. page is the page number in the processed result. original_page is the page number in the original document, which differs when you use page ranges. To convert to pixels, multiply by page dimensions:
x_px = bbox["left"] * page_width_px
y_px = bbox["top"] * page_height_px

Spreadsheet Citations

Excel and CSV files use cell coordinates instead of normalized positions:
  • left: Column number (1 = A, 2 = B, 3 = C)
  • top: Row number (1-indexed)
  • page: Sheet index (1 = first sheet)
A citation with {"left": 3, "top": 15, "page": 2} points to cell C15 on the second sheet.

Confidence Scores

Each citation includes a confidence field with a categorical value ("high" or "low"). By default in v3, numerical_confidence is enabled, which adds granular_confidence with numeric 0-1 scores:
{
  "confidence": "high",
  "granular_confidence": {
    "extract_confidence": 0.95,
    "parse_confidence": 0.91
  }
}
  • extract_confidence: How confident the LLM is about the extraction
  • parse_confidence: How confident the OCR/parsing is about the underlying text
To disable numeric scores and only get categorical confidence:
settings={
    "citations": {
        "enabled": True,
        "numerical_confidence": False  # Only return "high"/"low"
    }
}
Low parse_confidence suggests OCR errors. Low extract_confidence suggests the model was uncertain about interpretation.

Constraints

Citations disable chunking: The document is processed as a single unit to maintain precise coordinate mapping. Empty citations: Values that were inferred (not directly found) may have empty citations. Always check if field.citations: before accessing.

Studio Visualization

Every response includes a studio_link. In Studio, citations are interactive:
  • Click an extracted field to highlight its source in the document
  • Click a highlight to jump to the corresponding field