Citations

Citations link each extracted value to its source location in the document. When enabled, every field in your extraction result includes coordinates pointing to where the value was found. This matters for three reasons:

Verification: Confirm extractions are correct by seeing the source text
Debugging: When values are wrong, citations show where the model looked
User experience: Let users click from extracted data to the original location

Enabling Citations

Add citations.enabled to your extraction settings:

result = client.extract.run(
    input=upload.file_id,
    instructions={
        "schema": schema,
        "system_prompt": "Extract invoice details."
    },
    settings={
        "citations": {
            "enabled": True
        }
    }
)

Citation Structure

With citations enabled, the response format changes. The result becomes an object (instead of an array), and each value is wrapped with citation data:

{
  "result": {
    "invoice_total": {
      "value": 1575.00,
      "citations": [
        {
          "type": "Table",
          "content": "Total Due: $1,575.00",
          "bbox": {
            "left": 0.65,
            "top": 0.82,
            "width": 0.25,
            "height": 0.03,
            "page": 1,
            "original_page": 1
          },
          "confidence": "high",
          "granular_confidence": {
            "extract_confidence": 0.95,
            "parse_confidence": 0.91
          },
          "parentBlock": {
            "type": "Table",
            "content": "Invoice Total\nSubtotal: $1,500.00\nTax: $75.00\nTotal Due: $1,575.00",
            "bbox": {"left": 0.60, "top": 0.75, "width": 0.35, "height": 0.12, "page": 1}
          }
        }
      ]
    }
  }
}

Citation Fields

Field	Description
`type`	The block type where the value was found: `Text`, `Table`, `Key Value`, `Title`, etc.
`content`	The source text that was extracted from. May include more context than just the value.
`bbox`	Bounding box coordinates for the source location.
`confidence`	Overall confidence as `"high"` or `"low"`.
`granular_confidence`	Detailed scores: `extract_confidence` (0-1) and `parse_confidence` (0-1).
`parentBlock`	The larger Parse block containing this citation. Useful for understanding context.

Bounding Box Coordinates

Coordinates are normalized to the range [0, 1] relative to page dimensions:

Coordinate	Meaning
`left`	Distance from the left edge (0 = left margin, 1 = right margin)
`top`	Distance from the top edge (0 = top, 1 = bottom)
`width`	Width as fraction of page width
`height`	Height as fraction of page height
`page`	Page number (1-indexed) in the processed result
`original_page`	Page number in the original document

To convert to pixel coordinates:

def to_pixels(bbox, page_width, page_height):
    return {
        "left": bbox.left * page_width,
        "top": bbox.top * page_height,
        "width": bbox.width * page_width,
        "height": bbox.height * page_height
    }

# For a standard letter page (612x792 pixels)
pixels = to_pixels(citation.bbox, 612, 792)

Working with Citations

Accessing Citation Data

result = client.extract.run(
    input=upload.file_id,
    instructions={"schema": schema},
    settings={"citations": {"enabled": True, "numerical_confidence": True}}
)

# Access a scalar field
invoice_number = result.result["invoice_number"]
print(f"Value: {invoice_number.value}")

if invoice_number.citations:
    citation = invoice_number.citations[0]
    print(f"Found on page {citation.bbox.page}")
    print(f"Source text: {citation.content}")
    print(f"Confidence: {citation.confidence}")

Array Citations

For array fields, each item in the array has its own citations:

for i, item in enumerate(result.result["line_items"]):
    description = item["description"]
    amount = item["amount"]
    
    print(f"Item {i + 1}: {description.value} - ${amount.value}")
    
    if amount.citations:
        print(f"  Amount found at: page {amount.citations[0].bbox.page}")

Filtering by Confidence

Use confidence scores to flag uncertain extractions:

LOW_CONFIDENCE_THRESHOLD = 0.7

for field_name, field_data in result.result.items():
    if hasattr(field_data, 'citations') and field_data.citations:
        confidence = field_data.citations[0].confidence
        if confidence < LOW_CONFIDENCE_THRESHOLD:
            print(f"Low confidence ({confidence:.2f}): {field_name} = {field_data.value}")

Spreadsheet Citations

Excel and other spreadsheet formats use cell coordinates instead of normalized positions.

Coordinate Differences

Aspect	PDFs/Images	Spreadsheets
`left`	Fraction (0-1)	Column number (1 = A, 2 = B)
`top`	Fraction (0-1)	Row number (1-indexed)
`width`	Fraction (0-1)	Columns spanned
`height`	Fraction (0-1)	Rows spanned
`page`	Page number	Sheet index (1 = first sheet)

Example Spreadsheet Citation

{
  "bbox": {
    "left": 3,        // Column C
    "top": 15,        // Row 15
    "width": 1,       // Single column
    "height": 1,      // Single row
    "page": 2,        // Second sheet
    "original_page": 2
  }
}

This points to cell C15 on the second sheet. The coordinates map directly to Excel’s A1 notation.

def bbox_to_cell(bbox):
    """Convert spreadsheet bbox to cell reference."""
    col_letter = chr(ord('A') + bbox.left - 1)  # Simplified, doesn't handle AA, AB, etc.
    return f"{col_letter}{bbox.top}"

# bbox with left=3, top=15 becomes "C15"

Constraints and Limitations

Citations Disable Chunking

Citations require knowing exactly where each piece of content came from. Chunking merges content across boundaries, which would make citation coordinates ambiguous. When you enable citations:

Chunking is automatically disabled in the parsing step
The document is processed as a single unit
This may increase processing time for very long documents

If you have explicit chunking settings in your parsing configuration, they’ll be ignored when citations are enabled.

Streaming Array Extract Incompatible

The streaming mode for array extraction cannot be used with citations. If you need both complete arrays and citations:

# Works: default array_extract mode with citations
result = client.extract.run(
    input=upload.file_id,
    instructions={"schema": schema},
    settings={
        "array_extract": True,
        "citations": {"enabled": True}
    }
)

The default array extraction mode (not streaming) fully supports citations.

Empty Citations

Citations may be empty for fields that were inferred rather than directly extracted:

# If the document says "Payment Terms: Net 30"
# And your schema has a field for "days_until_due"
# The model extracts 30 but may not have a citation since "30" was derived from "Net 30"

Always check if field.citations: before accessing citation data.

Viewing in Studio

Every extraction response includes a studio_link. In Studio, citations become interactive:

Click an extracted field to highlight its source in the document
Click a highlight to jump to the corresponding field
See all citations overlaid on the document at once

This is particularly useful for debugging when extractions don’t match expectations. You can see exactly what the model identified as the source for each value.

Extract Overview

Endpoint basics and parameters.

Response Format

Full response structure details.

Best Practices

Schema design and prompt tips.

Array Extraction

Handle long documents with repeating data.

Get Started

Migration

Core Functions

Configurations

FAQ

Security and privacy

On-premise deployment

Enabling Citations

Citation Structure

Citation Fields

Bounding Box Coordinates

Working with Citations

Accessing Citation Data

Array Citations

Filtering by Confidence

Spreadsheet Citations

Coordinate Differences

Example Spreadsheet Citation

Constraints and Limitations

Citations Disable Chunking

Streaming Array Extract Incompatible

Empty Citations

Viewing in Studio

Extract Overview

Response Format

Best Practices

Array Extraction

Get Started

Migration

Core Functions

Configurations

FAQ

Security and privacy

On-premise deployment

​Enabling Citations

​Citation Structure

​Citation Fields

​Bounding Box Coordinates

​Working with Citations

​Accessing Citation Data

​Array Citations

​Filtering by Confidence

​Spreadsheet Citations

​Coordinate Differences

​Example Spreadsheet Citation

​Constraints and Limitations

​Citations Disable Chunking

​Streaming Array Extract Incompatible

​Empty Citations

​Viewing in Studio

​Related

Extract Overview

Response Format

Best Practices

Array Extraction

Enabling Citations

Citation Structure

Citation Fields

Bounding Box Coordinates

Working with Citations

Accessing Citation Data

Array Citations

Filtering by Confidence

Spreadsheet Citations

Coordinate Differences

Example Spreadsheet Citation

Constraints and Limitations

Citations Disable Chunking

Streaming Array Extract Incompatible

Empty Citations

Viewing in Studio

Related