Skip to main content

Documents and Processing

Set chunking to disabled (the default). The entire document will be returned as a single chunk with all content in the content field as Markdown.
result = client.parse.run(
    input=upload.file_id,
    retrieval={"chunking": {"chunk_mode": "disabled"}}
)

# Full document as markdown
markdown = result.result.chunks[0].content
Tables will be formatted according to your table_output_format setting (default: dynamic, which uses Markdown for simple tables).
Both fields contain the chunk’s text, but optimized for different purposes:
  • content: Raw extraction with original formatting. Tables appear as HTML or Markdown. Use for display.
  • embed: Optimized for vector embeddings. When embedding_optimized: true, tables become natural language summaries like “This table shows quarterly revenue…” which embed better.
For RAG: Use embed for your vector database, content for displaying results to users.See Understanding Chunks for details.
  1. Parse with variable chunking to get semantically meaningful segments
  2. Enable embedding optimization so tables become natural language
  3. Filter noise like headers and footers
  4. Store chunks in your vector database
result = client.parse.run(
    input=upload.file_id,
    retrieval={
        "chunking": {"chunk_mode": "variable", "chunk_size": 1000},
        "embedding_optimized": True,
        "filter_blocks": ["Header", "Footer", "Page Number"]
    }
)

for chunk in result.result.chunks:
    your_vector_db.insert(
        embedding=your_embedding_function(chunk.embed),
        metadata={"content": chunk.content, "blocks": chunk.blocks}
    )
See Parse Best Practices for more.
Adds significant latency:
  • enhance.agentic with any scope (runs VLM passes)
  • enhance.agentic[].advanced_chart_agent: true (detailed chart analysis)
  • Large documents with embedding_optimized: true
Moderate impact:
  • settings.return_images (generates cropped images)
  • settings.embed_pdf_metadata (modifies PDF)
Minimal impact:
  • Chunking mode changes
  • Table output format changes
  • Block filtering
For latency-sensitive applications, disable agentic modes and use async with priority for the fastest response.
No. Reducto manages model selection internally to optimize for accuracy, cost, and latency. The models used may change as we improve the system.For on-premise deployments, you can configure which LLM providers are available. See LLM Configuration.

URLs and Retention

  • Image URLs (image_url from return_images): Valid for 1 hour
  • PDF URLs (pdf_url): Valid for 1 hour
  • Result URLs (when type: "url"): Valid for 1 hour
Download or process any URLs promptly after receiving them.
By default, job results are deleted after 12 hours per Reducto’s zero data retention (ZDR) policy.To keep results longer:
result = client.parse.run(
    input=upload.file_id,
    settings={"persist_results": True}
)
With persist_results: true, results are stored indefinitely and can be retrieved anytime using the job ID. This requires opting in to Reducto Studio.
When the response exceeds approximately 6MB, Reducto returns result.type: "url" instead of result.type: "full". Fetch the content from result.url:
if result.result.type == "url":
    import requests
    chunks = requests.get(result.result.url).json()
else:
    chunks = result.result.chunks
To always get URL responses (for consistent handling):
result = client.parse.run(
    input=upload.file_id,
    settings={"force_url_result": True}
)
Jobs are deleted after 12 hours per the zero data retention policy. If you’re looking for a job from more than 12 hours ago, it has been automatically deleted.To prevent this:
  • Process results immediately when you receive them
  • Store results in your own database
  • Use persist_results: true to keep results indefinitely

API and Integration

  • Direct upload (/upload): 100MB
  • Presigned URL upload: 5GB
  • URL passthrough: No limit (Reducto fetches the file)
For files over 100MB, use the presigned URL method. For files over 5GB, host them on S3 or another storage service and pass the URL directly.
Visit status.reducto.ai for real-time status of all Reducto services, uptime history, and incident reports.Subscribe to updates to get notified of any service disruptions.

Still Have Questions?