Citation Location

The /cite endpoint allows you to find the exact location of text within a parsed document. Given a text string, it returns the bounding boxes where that text appears in the original document. This is useful for highlighting citations, building document viewers, or verifying extracted data against source locations.

Prerequisites

The document must be parsed with OCR data enabled:

from reducto import Reducto

client = Reducto()

# Parse with OCR data enabled
result = client.parse.run(
    document_url="https://example.com/document.pdf",
    options={"return_ocr_data": True}
)

Basic usage

Using a job ID

If you have a job ID from a previous parse operation, you can reference it directly:

import requests

response = requests.post(
    "https://your-reducto-instance/cite",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "source": "jobid://your-job-id",
        "queries": [
            {"text": "Total Revenue"}
        ]
    }
)

result = response.json()

Using a parse result directly

You can also pass the full parse result object:

import requests

# First, parse the document with OCR data
parse_response = requests.post(
    "https://your-reducto-instance/parse",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "document_url": "https://example.com/document.pdf",
        "options": {"return_ocr_data": True}
    }
)

parse_result = parse_response.json()["result"]

# Then find citations
cite_response = requests.post(
    "https://your-reducto-instance/cite",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "source": parse_result,
        "queries": [
            {"text": "Total Revenue"},
            {"text": "Net Income"}
        ]
    }
)

citations = cite_response.json()

Request format

Field	Type	Required	Description
`source`	string or object	Yes	Either `jobid://<job_id>` string or full parse result object. The parse must have been run with `return_ocr_data=true`.
`queries`	array	Yes	List of text citations to locate.

Query object

Field	Type	Required	Description
`text`	string	Yes	Text to locate. Whitespace is normalized for matching.
`bbox_filter`	object	No	Optional bounding box to limit the search region.

Bounding box filter

When you want to search within a specific region of a page:

{
    "source": "jobid://your-job-id",
    "queries": [
        {
            "text": "Amount",
            "bbox_filter": {
                "page": 1,
                "left": 0.0,
                "top": 0.0,
                "width": 0.5,
                "height": 0.5
            }
        }
    ]
}

Response format

{
    "results": [
        {
            "matches": [
                {
                    "page": 1,
                    "bboxes": [
                        {
                            "page": 1,
                            "left": 0.123,
                            "top": 0.456,
                            "width": 0.089,
                            "height": 0.023
                        }
                    ]
                }
            ]
        }
    ],
    "duration": 0.045
}

Field	Description
`results`	Array of results in the same order as input queries (1:1 correspondence).
`results[].matches`	All locations where the text was found. Empty array if no matches.
`results[].matches[].page`	Page number (1-indexed) where the match was found.
`results[].matches[].bboxes`	Bounding boxes for the match. Multiple boxes are returned for multi-line text.
`duration`	Processing time in seconds.

Text matching behavior

The endpoint normalizes text for matching:

Converts to lowercase
Removes punctuation
Collapses whitespace

This means a query for "Total Revenue" will match "total revenue", "Total Revenue", or "TOTAL REVENUE" in the document.

Multiple queries

You can search for multiple text strings in a single request. Results are returned in the same order as the queries:

response = requests.post(
    "https://your-reducto-instance/cite",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "source": "jobid://your-job-id",
        "queries": [
            {"text": "Revenue"},
            {"text": "Expenses"},
            {"text": "Net Income"}
        ]
    }
)

result = response.json()
# result["results"][0] corresponds to "Revenue"
# result["results"][1] corresponds to "Expenses"
# result["results"][2] corresponds to "Net Income"

Multi-line text

When the matched text spans multiple lines, the response includes separate bounding boxes for each line:

{
    "matches": [
        {
            "page": 1,
            "bboxes": [
                {
                    "page": 1,
                    "left": 0.1,
                    "top": 0.2,
                    "width": 0.3,
                    "height": 0.02
                },
                {
                    "page": 1,
                    "left": 0.1,
                    "top": 0.22,
                    "width": 0.25,
                    "height": 0.02
                }
            ]
        }
    ]
}

Error handling

Status Code	Description
400	Invalid source format, or OCR data not available (document was not parsed with `return_ocr_data=true`).
401	Invalid or missing API key.
404	Job ID not found or not accessible.

Use cases

Highlighting extracted values

After extracting structured data, use /cite to highlight where each value appears in the original document:

# Extract data
extract_response = client.extract.run(
    document_url="https://example.com/invoice.pdf",
    schema={
        "type": "object",
        "properties": {
            "total": {"type": "string"},
            "vendor": {"type": "string"}
        }
    },
    options={"return_ocr_data": True}
)

extracted = extract_response.result.data
job_id = extract_response.job_id

# Find where each value appears
cite_response = requests.post(
    "https://your-reducto-instance/cite",
    headers={"Authorization": "Bearer YOUR_API_KEY"},
    json={
        "source": f"jobid://{job_id}",
        "queries": [
            {"text": extracted["total"]},
            {"text": extracted["vendor"]}
        ]
    }
)

Building document viewers

Use the bounding boxes to draw highlights or annotations on document pages in your application.

Data verification

Verify that extracted values actually appear in the expected locations within the source document.

Get Started

Developer Tools

Core Functions

Workflows and Pipelines

Configurations

Reference

Components

Enterprise Resources

Security and privacy

On-premise Resources

Citation Location

Prerequisites

Basic usage

Using a job ID

Using a parse result directly

Request format

Query object

Bounding box filter

Response format

Text matching behavior

Multiple queries

Multi-line text

Error handling

Use cases

Highlighting extracted values

Building document viewers

Data verification

Get Started

Developer Tools

Core Functions

Workflows and Pipelines

Configurations

Reference

Components

Enterprise Resources

Security and privacy

On-premise Resources

Documentation Index

​Prerequisites

​Basic usage

​Using a job ID

​Using a parse result directly

​Request format

​Query object

​Bounding box filter

​Response format

​Text matching behavior

​Multiple queries

​Multi-line text

​Error handling

​Use cases

​Highlighting extracted values

​Building document viewers

​Data verification

Prerequisites

Basic usage

Using a job ID

Using a parse result directly

Request format

Query object

Bounding box filter

Response format

Text matching behavior

Multiple queries

Multi-line text

Error handling

Use cases

Highlighting extracted values

Building document viewers

Data verification