Learn about the response format for our parsing endpoint.
response['result']['chunks'][0]['content']
.
What is the difference between page and original_page?
advanced_options
-> page_range
.In these cases, it is useful to know the page number of the block within the returned context (e.g. of the pages parsed, this block was the 1st page) as well as the original page number in the source document (e.g. of the original document, this block was the 10th page).How do embed and content differ for the blocks?
embed
field contains the summarized table content, while the content
field contains the original table content. We have found that this improves the downstream performance of the embedding models which are not as capable of reasoning over complex tabular HTML.What is the result type?
url
and contain a URL pointing to a JSON array of chunks. For smaller documents, the result type will be full
and contain the chunks directly in the response. The chunks have the same structure in both cases - the only difference is whether they are returned directly or need to be fetched from a URL. You can read more about the URL response format here.How can I get images for figures and tables?
return_figure_images
parameter in the experimental options to get image URLs for figures in the document. Similarly, you can enable the return_table_images
parameter to get image URLs for tables. When enabled, the corresponding blocks will include an image_url
field that points to an image of the figure or table.Can Reducto handle checkboxes?
enable_checkboxes
parameter in the experimental options to add them to the output.Does Reducto handle skewed / rotated pages?
rotate_pages
parameter in the experimental options to let us automatically fix skewed pages for you. This is turned on by default.Can you return equations and subscripts/superscripts?
enable_equations
or enable_scripts
parameter in the experimental options for the output to contain equations or <sub> and <sup> tags.How do Excel/spreadsheet citations work differently?
left: 1, top: 1, width: 1, height: 1
page
field represents the sheet index (1-indexed). Sheet 1 = page 1, Sheet 2 = page 2, etc.page
field represents the actual page number in the document