Overview
Reducto provides bounding box citations so you can trace extracted data back to its original source inside a document. This feature is especially important for compliance, debugging, and trust in industries that need data grounded in real evidence.Why use citations?
- Traceability — Confirm where each extracted value came from in the source file.
- Compliance — Maintain audit trails for regulated workflows.
- Debugging — Compare schema outputs against original text to fix errors.
- User experience — In Studio, click between the output field and the highlighted bounding box on the document.
How-to enable citations
How citations work
Whengenerate_citations
is enabled, Reducto includes bounding box metadata for each extracted field. Citation data (citations
) is in a separate field than result
, which is the data returned in your specified schema structure.
Each bounding box (bbox
) represents the location of text in the document with coordinates relative to the top left corner of the page:
Sample response
left
, top
, width
, height
: pixel coordinates normalized to the page, relative to the top left corner.
parentBlock
: Since citations can be very granular and specific, theparentBlock
is the Parse block containing the extracted data. It helps with providing more contextual data.
Spreadsheet citations
Excel and other spreadsheet formats handle citations differently from PDFs and images: Coordinate system:- Excel: Uses actual row/column positions (1-indexed). For example, cell A1 would have coordinates
left: 1, top: 1, width: 1, height: 1
- Other formats: Use normalized coordinates in [0,1] range relative to page dimensions
- Excel: The
page
field represents the sheet index (1-indexed). Sheet 1 = page 1, Sheet 2 = page 2, etc. - Other formats: The
page
field represents the actual page number in the document
JSON output