POST
/
parse
import requests

url = "https://platform.reducto.ai/parse"

payload = {
    "options": {
        "ocr_mode": "standard",
        "extraction_mode": "ocr",
        "chunking": {
            "chunk_mode": "variable",
            "chunk_size": 123
        },
        "table_summary": {
            "enabled": False,
            "prompt": "<string>"
        },
        "figure_summary": {
            "enabled": False,
            "prompt": "<string>",
            "override": False
        },
        "filter_blocks": ["Page Number", "Header", "Footer", "Comment"],
        "force_url_result": False
    },
    "advanced_options": {
        "ocr_system": "highres",
        "table_output_format": "html",
        "merge_tables": False,
        "continue_hierarchy": True,
        "keep_line_breaks": False,
        "page_range": {
            "start": 123,
            "end": 123
        },
        "force_file_extension": "<string>",
        "large_table_chunking": {
            "enabled": True,
            "size": 50
        },
        "spreadsheet_table_clustering": "default",
        "add_page_markers": False,
        "remove_text_formatting": False,
        "return_ocr_data": False,
        "document_password": "<string>",
        "filter_line_numbers": False
    },
    "experimental_options": {
        "enrich": {
            "enabled": False,
            "mode": "standard",
            "prompt": "<string>"
        },
        "native_office_conversion": False,
        "enable_checkboxes": False,
        "enable_equations": False,
        "rotate_pages": True,
        "enable_underlines": False,
        "enable_scripts": False,
        "return_figure_images": False,
        "return_table_images": False,
        "danger_filter_wide_boxes": False
    },
    "document_url": "<string>",
    "priority": True
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)
{
  "job_id": "<string>",
  "duration": 123,
  "pdf_url": "<string>",
  "usage": {
    "num_pages": 123
  },
  "result": {
    "type": "full",
    "chunks": [
      {
        "content": "<string>",
        "embed": "<string>",
        "enriched": "<string>",
        "enrichment_success": false,
        "blocks": [
          {
            "type": "Header",
            "bbox": {
              "left": 123,
              "top": 123,
              "width": 123,
              "height": 123,
              "page": 123,
              "original_page": 123
            },
            "content": "<string>",
            "image_url": "<string>"
          }
        ]
      }
    ],
    "ocr": {
      "words": [
        {
          "text": "<string>",
          "bbox": {
            "left": 123,
            "top": 123,
            "width": 123,
            "height": 123,
            "page": 123,
            "original_page": 123
          }
        }
      ],
      "lines": [
        {
          "text": "<string>",
          "bbox": {
            "left": 123,
            "top": 123,
            "width": 123,
            "height": 123,
            "page": 123,
            "original_page": 123
          }
        }
      ]
    },
    "custom": "<any>"
  }
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
document_url
required

The URL of the document to be processed. You can provide one of the following:

  1. A publicly available URL
  2. A presigned S3 URL
  3. A reducto:// prefixed URL obtained from the /upload endpoint after directly uploading a document
options
object
advanced_options
object
experimental_options
object
priority
boolean
default:true

If True, attempts to process the job with priority if the user has priority processing budget available; by default, sync jobs are prioritized above async jobs.

Response

200
application/json
Successful Response
job_id
string
required
duration
number
required

The duration of the parse request in seconds.

usage
object
required
result
object
required

The response from the document processing service. Note that there can be two types of responses, Full Result and URL Result. This is due to limitations on the max return size on HTTPS. If the response is too large, it will be returned as a presigned URL in the URL response. You should handle this in your application.

pdf_url
string | null

The storage URL of the converted PDF file.

Was this page helpful?