Overview
Reducto provides bounding box citations so you can trace extracted data back to its original source inside a document. This feature is especially important for compliance, debugging, and trust in industries that need data grounded in real evidence.Why use citations?
- Traceability โ Confirm where each extracted value came from in the source file.
- Compliance โ Maintain audit trails for regulated workflows.
- Debugging โ Compare schema outputs against original text to fix errors.
- User experience โ In Studio, click between the output field and the highlighted bounding box on the document.
How-to enable citations
How citations work
When citations are enabled viasettings.citations.enabled, Reducto includes bounding box metadata alongside each extracted field value. In the v3 format, each field in the result contains both a value and its associated citations.
Each bounding box (bbox) represents the location of text in the document with coordinates relative to the top left corner of the page:
Sample response
left, top, width, height: pixel coordinates normalized to the page, relative to the top left corner.
parentBlock: Since citations can be very granular and specific, theparentBlockis the Parse block containing the extracted data. It helps with providing more contextual data.
Spreadsheet citations
Excel and other spreadsheet formats handle citations differently from PDFs and images: Coordinate system:- Excel: Uses actual row/column positions (1-indexed). For example, cell A1 would have coordinates
left: 1, top: 1, width: 1, height: 1 - Other formats: Use normalized coordinates in [0,1] range relative to page dimensions
- Excel: The
pagefield represents the sheet index (1-indexed). Sheet 1 = page 1, Sheet 2 = page 2, etc. - Other formats: The
pagefield represents the actual page number in the document
JSON output