🚀 Our new and improved config V3 is now live! See API reference for details.
import requests
url = "https://platform.reducto.ai/pipeline"
payload = {
"document_url": "<string>",
"pipeline_id": "<string>"
}
headers = {
"Authorization": "Bearer <token>",
"Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.json()){
"job_id": "<string>",
"usage": {
"num_pages": 123,
"credits": 123
},
"result": {
"parse": {
"job_id": "<string>",
"duration": 123,
"usage": {
"num_pages": 123,
"credits": 123
},
"result": {
"type": "<string>",
"chunks": [
{
"content": "<string>",
"embed": "<string>",
"enriched": "<string>",
"blocks": [
{
"type": "Header",
"bbox": {
"left": 123,
"top": 123,
"width": 123,
"height": 123,
"page": 123,
"original_page": 123
},
"content": "<string>",
"image_url": "<string>",
"confidence": "low",
"granular_confidence": {
"extract_confidence": 123,
"parse_confidence": 123
}
}
],
"enrichment_success": false
}
],
"ocr": {
"words": [
{
"text": "<string>",
"bbox": {
"left": 123,
"top": 123,
"width": 123,
"height": 123,
"page": 123,
"original_page": 123
},
"confidence": 123,
"chunk_index": 123
}
],
"lines": [
{
"text": "<string>",
"bbox": {
"left": 123,
"top": 123,
"width": 123,
"height": 123,
"page": 123,
"original_page": 123
},
"confidence": 123,
"chunk_index": 123
}
]
},
"custom": "<unknown>"
},
"pdf_url": "<string>",
"studio_link": "<string>"
},
"extract": [
{
"split_name": "<string>",
"page_range": [
123
],
"result": {
"usage": {
"num_pages": 123,
"num_fields": 123,
"credits": 123
},
"result": [
"<unknown>"
],
"citations": [
"<unknown>"
],
"job_id": "<string>",
"studio_link": "<string>"
},
"partition": "<string>"
}
],
"split": {
"usage": {
"num_pages": 123,
"credits": 123
},
"result": {
"section_mapping": {},
"splits": [
{
"name": "<string>",
"pages": [
123
],
"conf": "low",
"partitions": [
{
"name": "<string>",
"pages": [
123
],
"conf": "low"
}
]
}
]
}
},
"edit": {
"document_url": "<string>",
"form_schema": [
{
"bbox": {
"left": 123,
"top": 123,
"width": 123,
"height": 123,
"page": 123,
"original_page": 123
},
"description": "<string>",
"type": "text",
"fill": true,
"value": "<string>"
}
]
}
}
}import requests
url = "https://platform.reducto.ai/pipeline"
payload = {
"document_url": "<string>",
"pipeline_id": "<string>"
}
headers = {
"Authorization": "Bearer <token>",
"Content-Type": "application/json"
}
response = requests.post(url, json=payload, headers=headers)
print(response.json()){
"job_id": "<string>",
"usage": {
"num_pages": 123,
"credits": 123
},
"result": {
"parse": {
"job_id": "<string>",
"duration": 123,
"usage": {
"num_pages": 123,
"credits": 123
},
"result": {
"type": "<string>",
"chunks": [
{
"content": "<string>",
"embed": "<string>",
"enriched": "<string>",
"blocks": [
{
"type": "Header",
"bbox": {
"left": 123,
"top": 123,
"width": 123,
"height": 123,
"page": 123,
"original_page": 123
},
"content": "<string>",
"image_url": "<string>",
"confidence": "low",
"granular_confidence": {
"extract_confidence": 123,
"parse_confidence": 123
}
}
],
"enrichment_success": false
}
],
"ocr": {
"words": [
{
"text": "<string>",
"bbox": {
"left": 123,
"top": 123,
"width": 123,
"height": 123,
"page": 123,
"original_page": 123
},
"confidence": 123,
"chunk_index": 123
}
],
"lines": [
{
"text": "<string>",
"bbox": {
"left": 123,
"top": 123,
"width": 123,
"height": 123,
"page": 123,
"original_page": 123
},
"confidence": 123,
"chunk_index": 123
}
]
},
"custom": "<unknown>"
},
"pdf_url": "<string>",
"studio_link": "<string>"
},
"extract": [
{
"split_name": "<string>",
"page_range": [
123
],
"result": {
"usage": {
"num_pages": 123,
"num_fields": 123,
"credits": 123
},
"result": [
"<unknown>"
],
"citations": [
"<unknown>"
],
"job_id": "<string>",
"studio_link": "<string>"
},
"partition": "<string>"
}
],
"split": {
"usage": {
"num_pages": 123,
"credits": 123
},
"result": {
"section_mapping": {},
"splits": [
{
"name": "<string>",
"pages": [
123
],
"conf": "low",
"partitions": [
{
"name": "<string>",
"pages": [
123
],
"conf": "low"
}
]
}
]
}
},
"edit": {
"document_url": "<string>",
"form_schema": [
{
"bbox": {
"left": 123,
"top": 123,
"width": 123,
"height": 123,
"page": 123,
"original_page": 123
},
"description": "<string>",
"type": "text",
"fill": true,
"value": "<string>"
}
]
}
}
}Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
The URL of the document to be processed. You can provide one of the following: 1. A publicly available URL 2. A presigned S3 URL 3. A reducto:// prefixed URL obtained from the /upload endpoint after directly uploading a document
The ID of the pipeline to use for the document.
Successful Response
Show child attributes
Show child attributes
The duration of the parse request in seconds.
The response from the document processing service. Note that there can be two types of responses, Full Result and URL Result. This is due to limitations on the max return size on HTTPS. If the response is too large, it will be returned as a presigned URL in the URL response. You should handle this in your application.
Show child attributes
type = 'full'
"full"Show child attributes
The content of the chunk extracted from the document.
Chunk content optimized for embedding and retrieval.
The enriched content of the chunk extracted from the document.
Show child attributes
The type of block extracted from the document.
Header, Footer, Title, Section Header, Page Number, List Item, Figure, Table, Key Value, Text, Comment, Signature The bounding box of the block extracted from the document.
Show child attributes
The page number of the bounding box (1-indexed).
The page number in the original document of the bounding box (1-indexed).
The content of the block extracted from the document.
(Experimental) The URL of the image associated with the block.
The confidence for the block. It is either low or high and takes into account factors like OCR and table structure
Granular confidence scores for the block. It is a dictionary of confidence scores for the block. The confidence scores will not be None if the user has enabled numeric confidence scores.
Whether the enrichment was successful.
Show child attributes
Show child attributes
Show child attributes
The page number of the bounding box (1-indexed).
The page number in the original document of the bounding box (1-indexed).
OCR confidence score between 0 and 1, where 1 indicates highest confidence
The index of the chunk that the word belongs to.
Show child attributes
Show child attributes
The page number of the bounding box (1-indexed).
The page number in the original document of the bounding box (1-indexed).
OCR confidence score between 0 and 1, where 1 indicates highest confidence
The index of the chunk that the line belongs to.
The storage URL of the converted PDF file.
The link to the studio pipeline for the document.
This is the response format for Extract -> Split Pipelines
Show child attributes
Show child attributes
The extracted response in your provided schema. This is a list of dictionaries. If disable_chunking is True (default), then it will be a list of length one.
The citations corresponding to the extracted response.
The link to the studio pipeline for the document.
Show child attributes
The split result.
Show child attributes
Show child attributes
high, low Show child attributes
Presigned URL to download the edited document.
Form schema for PDF forms. List of widgets with their types, descriptions, and bounding boxes.
Show child attributes
Bounding box coordinates of the widget
Show child attributes
The page number of the bounding box (1-indexed).
The page number in the original document of the bounding box (1-indexed).
Description of the widget extracted from the document
Type of the form widget
text, checkbox, dropdown, barcode If True (default), the system will attempt to fill this widget. If False, the widget will be created but intentionally left unfilled.
If provided, this value will be used directly instead of attempting to intelligently determine the field value.
Was this page helpful?