POST
/
extract
import requests

url = "https://platform.reducto.ai/extract"

payload = {
    "options": {
        "ocr_mode": "standard",
        "extraction_mode": "ocr",
        "chunking": {
            "chunk_mode": "variable",
            "chunk_size": 123
        },
        "table_summary": {
            "enabled": False,
            "prompt": "<string>"
        },
        "figure_summary": {
            "enabled": False,
            "prompt": "<string>",
            "override": False
        },
        "filter_blocks": ["Page Number", "Header", "Footer", "Comment"],
        "force_url_result": False
    },
    "advanced_options": {
        "ocr_system": "highres",
        "table_output_format": "html",
        "merge_tables": False,
        "continue_hierarchy": True,
        "keep_line_breaks": False,
        "page_range": {
            "start": 123,
            "end": 123
        },
        "force_file_extension": "<string>",
        "large_table_chunking": {
            "enabled": True,
            "size": 50
        },
        "spreadsheet_table_clustering": "default",
        "add_page_markers": False,
        "remove_text_formatting": False,
        "return_ocr_data": False,
        "document_password": "<string>",
        "filter_line_numbers": False
    },
    "experimental_options": {
        "enrich": {
            "enabled": False,
            "mode": "standard",
            "prompt": "<string>"
        },
        "native_office_conversion": False,
        "enable_checkboxes": False,
        "enable_equations": False,
        "rotate_pages": True,
        "enable_underlines": False,
        "enable_scripts": False,
        "return_figure_images": False,
        "return_table_images": False,
        "danger_filter_wide_boxes": False
    },
    "document_url": "<string>",
    "schema": "<any>",
    "system_prompt": "Be precise and thorough.",
    "generate_citations": False,
    "array_extract": {
        "enabled": False,
        "mode": "legacy",
        "pages_per_segment": 10,
        "streaming_extract_item_density": 50
    },
    "use_chunking": False,
    "priority": True
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)
{
  "usage": {
    "num_pages": 123,
    "num_fields": 123
  },
  "result": [
    "<any>"
  ],
  "citations": [
    "<any>"
  ]
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
document_url
required

The URL of the document to be processed. You can provide one of the following:

  1. A publicly available URL
  2. A presigned S3 URL
  3. A reducto:// prefixed URL obtained from the /upload endpoint after directly uploading a document
  4. A job_id (jobid://) or a list of job_ids (jobid://)
schema
any
required

The JSON schema to use for extraction.

options
object
advanced_options
object
experimental_options
object
system_prompt
string
default:Be precise and thorough.

A system prompt to use for the extraction. This is a general prompt that is applied to the entire document before any other prompts.

generate_citations
boolean
default:false

If citations should be generated for the extracted content.

array_extract
object

The configuration options for array extract

use_chunking
boolean
default:false

If chunking should be used for the extraction. Defaults to False.

priority
boolean
default:true

If True, attempts to process the job with priority if the user has priority processing budget available; by default, sync jobs are prioritized above async jobs.

Response

200
application/json
Successful Response
usage
object
required
result
any[]
required

The extracted response in your provided schema. This is a list of dictionaries. If disbale_chunking is True (default), then it will be a list of length one.

citations
any[] | null
required

The citations corresponding to the extracted response.

Was this page helpful?