Overview

Reducto provides bounding box citations so you can trace extracted data back to its original source inside a document.
This feature is especially important for compliance, debugging, and trust in industries that need data grounded in real evidence.

Why use citations?

  • Traceability — Confirm where each extracted value came from in the source file.
  • Compliance — Maintain audit trails for regulated workflows.
  • Debugging — Compare schema outputs against original text to fix errors.
  • User experience — In Studio, click between the output field and the highlighted bounding box on the document.

How-to enable citations

result = client.extract.run(
    document_url=upload,
	schema=schema,
	system_prompt=system_prompt,
	generate_citations=True,
)

How citations work

When generate_citations is enabled, Reducto includes bounding box metadata for each extracted field. Citation data (citations) is in a separate field than result, which is the data returned in your specified schema structure. Each bounding box (bbox) represents the location of text in the document with coordinates:
Sample response
{
  "citations": [
    {
      "sample_extracted_field": [
        {
          "bbox": {
            "left": 0.1,
            "top": 0.2,
            "width": 0.3,
            "height": 0.05,
            "page": 1,
            "original_page": 1
          },
          "confidence": "high",
          "content": "granular citation",
          "image_url": null,
          "parentBlock": {
            "bbox": {
              "left": 0.1,
              "top": 0.9,
              "width": 0.8,
              "height": 0.05,
              "page": 1
            },
            "block_type": "Text",
            "confidence": "high",
            "content": "This is the full sentence with the granular citation."
          },
          "type": "Text"
        }
      ]
    }
  ],
  "result": ....
}


In the bounding box coordinates: left, top, width, height: pixel coordinates normalized to the page.
  • parentBlock: Since citations can be very granular and specific, the parentBlock is the Parse block containing the extracted data. It helps with providing more contextual data.
In Studio, citations are two-way links: -> Click an extracted field to see its highlight in the document. -> Click a highlighted bounding box to locate the field in the output.

Spreadsheet citations

Excel and other spreadsheet formats handle citations differently from PDFs and images: Coordinate system:
  • Excel: Uses actual row/column positions (1-indexed). For example, cell A1 would have coordinates left: 1, top: 1, width: 1, height: 1
  • Other formats: Use normalized coordinates in [0,1] range relative to page dimensions
Page field:
  • Excel: The page field represents the sheet index (1-indexed). Sheet 1 = page 1, Sheet 2 = page 2, etc.
  • Other formats: The page field represents the actual page number in the document
Example Excel citation:
JSON output
{
  "bbox": {
    "left": 2,      // Column B (1-indexed)
    "top": 5,       // Row 5 (1-indexed) 
    "width": 1,     // 1 column wide
    "height": 1,    // 1 row tall
    "page": 1,      // First sheet
    "original_page": 1
  }
}
This allows for precise cell-level citations that correspond directly to Excel’s native coordinate system.