Skip to main content
POST
/
extract
Extract
import requests

url = "https://platform.reducto.ai/extract"

payload = {
    "input": "<string>",
    "parsing": {
        "enhance": {
            "agentic": [],
            "summarize_figures": True
        },
        "retrieval": {
            "chunking": { "chunk_mode": "disabled" },
            "embedding_optimized": False,
            "filter_blocks": []
        },
        "formatting": {
            "add_page_markers": False,
            "include": [],
            "merge_tables": False,
            "table_output_format": "dynamic"
        },
        "spreadsheet": {
            "clustering": "accurate",
            "exclude": [],
            "include": [],
            "split_large_tables": {
                "enabled": True,
                "size": 50
            }
        },
        "settings": {
            "embed_pdf_metadata": False,
            "force_url_result": False,
            "ocr_system": "standard",
            "persist_results": False,
            "return_images": [],
            "return_ocr_data": False,
            "timeout": 900
        }
    },
    "instructions": { "system_prompt": "Be precise and thorough." },
    "settings": {
        "include_images": False,
        "optimize_for_latency": False,
        "array_extract": False,
        "citations": {
            "enabled": True,
            "numerical_confidence": True
        }
    }
}
headers = {
    "Authorization": "Bearer <token>",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.json())
{
  "job_id": "<string>",
  "usage": {
    "num_pages": 123,
    "num_fields": 123,
    "credits": 123
  },
  "studio_link": "<string>",
  "result": "<any>"
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json
  • SyncExtractConfig
  • AsyncExtractConfig
input
required

The URL of the document to be processed. You can provide one of the following: 1. A publicly available URL 2. A presigned S3 URL 3. A reducto:// prefixed URL obtained from the /upload endpoint after directly uploading a document 4. A jobid:// prefixed URL obtained from a previous /parse invocation

parsing
object

The configuration options for parsing the document. If you are passing in a jobid:// URL for the file, then this configuration will be ignored.

instructions
object

The instructions to use for the extraction.

settings
object

The settings to use for the extraction.

Response

Successful Response

  • V3ExtractResponse
  • AsyncExtractResponse
usage
object
required
result
required

The extracted response in your provided schema. This is a list of dictionaries. If disable_chunking is True (default), then it will be a list of length one.

job_id
string | null

The link to the studio pipeline for the document.