Skip to main content
Extract returns your extracted data as structured JSON matching your schema. The response format differs depending on whether citations are enabled.

Response Structure

Without Citations (Default)

When citations are disabled (the default), result contains an array of objects with your extracted values directly:
{
  "job_id": "9531166f-9725-4854-8096-459785a33972",
  "result": [
    {
      "invoice_number": "INV-2024-001",
      "total": 1575.00,
      "line_items": [
        {
          "description": "Professional Services",
          "quantity": 10,
          "amount": 1500.00
        },
        {
          "description": "Materials",
          "quantity": 1,
          "amount": 75.00
        }
      ]
    }
  ],
  "usage": {
    "num_pages": 1,
    "num_fields": 8,
    "credits": 8.0
  },
  "studio_link": "https://studio.reducto.ai/job/9531166f-..."
}

Top-Level Fields

FieldTypeDescription
job_idstringUnique identifier for this extraction job. Use this to retrieve results later or reference in support requests.
resultarray or objectWithout citations: an array containing your extracted data. With citations: an object with wrapped values.
usage.num_pagesintegerNumber of document pages processed.
usage.num_fieldsintegerTotal number of fields extracted, including nested fields in arrays.
usage.creditsnumberCredits consumed for this extraction.
studio_linkstringLink to view and debug this extraction in Reducto Studio.

Accessing Values

Without Citations

When citations are disabled, access values directly from the result array:
# Access the first (usually only) result object
data = result.result[0]

# Access scalar fields directly
invoice_number = data["invoice_number"]
total = data["total"]

# Access array items
for item in data["line_items"]:
    print(f"{item['description']}: ${item['amount']}")

With Citations

When citations are enabled, values are wrapped in objects with value and citations fields:
# With citations, result is a dict (not an array)
invoice_number = result.result["invoice_number"].value
total = result.result["total"].value

# Access array items
for item in result.result["line_items"]:
    print(f"{item['description'].value}: ${item['amount'].value}")
When a field cannot be extracted, it may appear as null or be absent entirely, depending on whether it was marked as required in your schema.

Citations

When settings.citations.enabled is true, the response format changes. The result becomes an object (not an array), and each value is wrapped with citation data:
{
  "result": {
    "total": {
      "value": 1575.00,
      "citations": [
        {
          "type": "Table",
          "content": "Total Due: $1,575.00",
          "bbox": {
            "left": 0.65,
            "top": 0.82,
            "width": 0.25,
            "height": 0.03,
            "page": 1,
            "original_page": 1
          },
          "confidence": "high",
          "granular_confidence": {
            "extract_confidence": 0.95,
            "parse_confidence": 0.91
          },
          "parentBlock": {
            "type": "Table",
            "content": "Invoice Total\nTotal Due: $1,575.00",
            "bbox": {"left": 0.60, "top": 0.78, "width": 0.35, "height": 0.08, "page": 1}
          }
        }
      ]
    }
  }
}

Citation Fields

FieldDescription
typeBlock type where the value was found: Text, Table, Key Value, etc.
contentThe source text from which the value was extracted. May differ slightly from the extracted value due to formatting normalization.
bboxBounding box coordinates for the source location.
confidenceOverall confidence as "high" or "low".
granular_confidenceDetailed confidence breakdown with extract_confidence (0-1) and parse_confidence (0-1).
parentBlockThe larger Parse block containing this citation. Useful for context when the citation is very granular.

Bounding Box Coordinates

All coordinates are normalized to the range [0, 1] relative to page dimensions:
FieldDescription
leftDistance from the left edge. 0 is the left margin, 1 is the right margin.
topDistance from the top edge. 0 is the top, 1 is the bottom.
widthWidth as a fraction of page width.
heightHeight as a fraction of page height.
pagePage number (1-indexed) in the processed document.
original_pagePage number in the original document. Differs from page when using page_range to process a subset.
To convert to pixel coordinates, multiply by the page dimensions:
# If your page is 612x792 pixels (standard letter)
bbox = citation.bbox
pixel_left = bbox.left * 612
pixel_top = bbox.top * 792
pixel_width = bbox.width * 612
pixel_height = bbox.height * 792

Array Citations

For array fields, each item in the array has its own citations. The structure mirrors the data:
{
  "line_items": [
    {
      "description": {
        "value": "Professional Services",
        "citations": [{"bbox": {...}, "content": "Professional Services", ...}]
      },
      "amount": {
        "value": 1500.00,
        "citations": [{"bbox": {...}, "content": "$1,500.00", ...}]
      }
    },
    {
      "description": {
        "value": "Materials",
        "citations": [{"bbox": {...}, "content": "Materials", ...}]
      },
      "amount": {
        "value": 75.00,
        "citations": [{"bbox": {...}, "content": "$75.00", ...}]
      }
    }
  ]
}
Each field within each array item has its own citation pointing to where that specific value was found.

Spreadsheet Citations

Excel and other spreadsheet formats use a different coordinate system because they have cells, not continuous pages.

Coordinate Differences

AspectPDFs/ImagesSpreadsheets
Coordinate systemNormalized 0-1 rangeCell positions (1-indexed)
leftFraction of page widthColumn number (1 = A, 2 = B, etc.)
topFraction of page heightRow number
widthFraction of page widthNumber of columns spanned
heightFraction of page heightNumber of rows spanned
pagePage numberSheet index (1 = first sheet)

Example Spreadsheet Citation

{
  "bbox": {
    "left": 2,       // Column B
    "top": 5,        // Row 5
    "width": 1,      // Single column
    "height": 1,     // Single row
    "page": 1,       // First sheet
    "original_page": 1
  }
}
This citation points to cell B5 on the first sheet. The coordinates map directly to Excel’s A1 notation, making it straightforward to locate the source cell programmatically.

Confidence Scores

Confidence indicates how certain the extraction is about a value. Each citation includes both summary and detailed confidence information.

Summary Confidence

The confidence field provides a quick assessment:
"confidence": "high"
Values are either "high" or "low" based on internal thresholds.

Granular Confidence

The granular_confidence object provides detailed numerical scores:
"granular_confidence": {
  "extract_confidence": 0.95,
  "parse_confidence": 0.91
}
ScoreDescription
extract_confidenceHow confident the extraction LLM is about this value (0-1). May be null for array items.
parse_confidenceHow confident the parsing stage was about the source text (0-1). Reflects OCR and layout detection quality.
Use granular confidence when you need to set custom thresholds or debug extraction issues. Low parse_confidence suggests the source document may have OCR or layout problems. Low extract_confidence suggests the schema description may need refinement.

Usage and Credits

The usage object shows what was processed and what it cost:
{
  "usage": {
    "num_pages": 3,
    "num_fields": 24,
    "credits": 12.0
  }
}
FieldDescription
num_pagesDocument pages that were processed. Affected by page_range settings.
num_fieldsTotal leaf fields extracted. A schema with 5 scalar fields and an array of 10 objects with 2 fields each would report 25 fields.
creditsCredits charged. Based on pages processed plus complexity factors like agentic modes and latency optimization.
Credit calculation varies based on:
  • Number of pages processed
  • Whether agentic parsing modes were used
  • Whether optimize_for_latency was enabled (2x multiplier)
  • Spreadsheet complexity (cell count for Excel files)
See Credit Usage for detailed pricing.

Complete Example

{
  "job_id": "543d1950-068c-4e38-981d-98903326b554",
  "result": {
    "invoice_number": {
      "value": "INV-2024-001",
      "citations": [
        {
          "type": "Text",
          "content": "Invoice #INV-2024-001",
          "bbox": {"left": 0.70, "top": 0.08, "width": 0.20, "height": 0.02, "page": 1, "original_page": 1},
          "confidence": "high",
          "granular_confidence": {"extract_confidence": 0.98, "parse_confidence": 0.95}
        }
      ]
    },
    "date": {
      "value": "2024-01-15",
      "citations": [
        {
          "type": "Text",
          "content": "Date: January 15, 2024",
          "bbox": {"left": 0.70, "top": 0.11, "width": 0.15, "height": 0.02, "page": 1, "original_page": 1},
          "confidence": "high",
          "granular_confidence": {"extract_confidence": 0.96, "parse_confidence": 0.94}
        }
      ]
    },
    "total": {
      "value": 1575.00,
      "citations": [
        {
          "type": "Table",
          "content": "Total: $1,575.00",
          "bbox": {"left": 0.75, "top": 0.85, "width": 0.15, "height": 0.02, "page": 1, "original_page": 1},
          "confidence": "high",
          "granular_confidence": {"extract_confidence": 0.97, "parse_confidence": 0.91}
        }
      ]
    },
    "line_items": [
      {
        "description": {
          "value": "Professional Services",
          "citations": [
            {
              "type": "Table",
              "content": "Professional Services",
              "bbox": {"left": 0.10, "top": 0.45, "width": 0.35, "height": 0.02, "page": 1, "original_page": 1},
              "confidence": "high",
              "granular_confidence": {"extract_confidence": null, "parse_confidence": 0.93}
            }
          ]
        },
        "amount": {
          "value": 1500.00,
          "citations": [
            {
              "type": "Table",
              "content": "$1,500.00",
              "bbox": {"left": 0.78, "top": 0.45, "width": 0.12, "height": 0.02, "page": 1, "original_page": 1},
              "confidence": "high",
              "granular_confidence": {"extract_confidence": null, "parse_confidence": 0.93}
            }
          ]
        }
      },
      {
        "description": {
          "value": "Materials",
          "citations": [
            {
              "type": "Table",
              "content": "Materials",
              "bbox": {"left": 0.10, "top": 0.48, "width": 0.20, "height": 0.02, "page": 1, "original_page": 1},
              "confidence": "high",
              "granular_confidence": {"extract_confidence": null, "parse_confidence": 0.91}
            }
          ]
        },
        "amount": {
          "value": 75.00,
          "citations": [
            {
              "type": "Table",
              "content": "$75.00",
              "bbox": {"left": 0.78, "top": 0.48, "width": 0.10, "height": 0.02, "page": 1, "original_page": 1},
              "confidence": "high",
              "granular_confidence": {"extract_confidence": null, "parse_confidence": 0.91}
            }
          ]
        }
      }
    ]
  },
  "usage": {
    "num_pages": 1,
    "num_fields": 8,
    "credits": 6.0
  },
  "studio_link": "https://studio.reducto.ai/job/543d1950-068c-4e38-981d-98903326b554"
}