Response Format
Learn about the response format for our parsing endpoint.
The parsing response format is optimized for flexibility with retrieval augmented generation. If you are just looking for a markdown representation of a given document, you can disable chunking altogether and just use response['result']['chunks'][0]['content']
.
FAQ
What is the difference between page and original_page?
What is the difference between page and original_page?
We allow you to specify the page range within the original document you want to parse. This is controlled by advanced_options
-> page_range
.
In these cases, it is useful to know the page number of the block within the returned context (e.g. of the pages parsed, this block was the 1st page) as well as the original page number in the source document (e.g. of the original document, this block was the 10th page).
How do embed and content differ for the blocks?
How do embed and content differ for the blocks?
For the most part these sections are actually the same. However, reducto’s API can apply improvements for optimizing for embedding performance. One example is the table summarization feature. In this case, the embed
field contains the summarized table content, while the content
field contains the original table content. We have found that this improves the downstream performance of the embedding models which are not as capable of reasoning over complex tabular HTML.
What is the result type?
What is the result type?
For longer documents, the API may return a URL instead of the full result. In this case, the result type will be url
and contain a URL pointing to a JSON array of chunks. For smaller documents, the result type will be full
and contain the chunks directly in the response. The chunks have the same structure in both cases - the only difference is whether they are returned directly or need to be fetched from a URL. You can read more about the URL response format here.
How can I get images for figures and tables?
How can I get images for figures and tables?
You can enable the return_figure_images
parameter in the experimental options to get image URLs for figures in the document. Similarly, you can enable the return_table_images
parameter to get image URLs for tables. When enabled, the corresponding blocks will include an image_url
field that points to an image of the figure or table.
Can Reducto handle checkboxes?
Can Reducto handle checkboxes?
Yes. You can enable the enable_checkboxes
parameter in the experimental options to add them to the output.
Does Reducto handle skewed / rotated pages?
Does Reducto handle skewed / rotated pages?
Yes. You can enable the rotate_pages
parameter in the experimental options to let us automatically fix skewed pages for you. This is turned on by default.
Can you return equations and subscripts/superscripts?
Can you return equations and subscripts/superscripts?
Yes. You can enable the enable_equations
or enable_scripts
parameter in the experimental options for the output to contain equations or <sub> and <sup> tags.