Either “full” or “url” — defines how the chunks are delivered (inline or via URL).
Present only when type
is “full” — an array of chunk objects.
The content of the chunk extracted from the document.
Chunk content optimized for embedding and retrieval. For example, it differs from content
when figure/table summaries are enabled.
The enriched content of the chunk extracted from the document.
Array of block-level structures—such as Text, Table, or Figure blocks.
Block type indicating the content category: e.g., “Text”, “Table”, “Figure”.
Contains bounding box information for the block, using normalized or spreadsheet-based coordinates.
Normalized horizontal start coordinate (0–1 range), or column index in spreadsheets.
Normalized vertical start coordinate (0–1 range), or row index in spreadsheets.
Normalized width relative to page size (0–1 range), or column width count in spreadsheets.
Normalized height relative to page size (0–1 range), or row height count in spreadsheets.
The parsed page number (1-indexed)—either page index or sheet index for spreadsheets.
blocks[].bbox.original_page
The original page number from the source document, useful when filtering or slicing page ranges.
The actual content of the block—text, table HTML, or figure caption/data.
Presigned URL (if enabled) to download the block’s figure or table image; may expire (~24 h).
Either a “low” or “high” confidence score for each block.
blocks[].logprobs_confidence
Numeric confidence score based on logprobs and OCR confidence.
Present only when type
is “url” — endpoint to fetch the chunk JSON remotely.