parse.run() method converts documents into structured JSON with text, tables, and figures. It runs OCR, detects document layout, and returns content organized into chunks optimized for LLM and RAG workflows.
Basic Usage
Method Signature
Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
input | str | list[str] | Yes | File ID (reducto://...), URL, or jobid:// reference |
enhance | dict | None | No | Enhancement options (agentic mode, figure summaries) |
formatting | dict | None | No | Output formatting (table formats, metadata) |
retrieval | dict | None | No | Chunking and filtering options |
settings | dict | None | No | Processing settings (page range, OCR, timeouts) |
spreadsheet | dict | None | No | Spreadsheet-specific options |
Input Options
Theinput parameter accepts several formats:
Configuration Examples
Chunking
By default, Parse returns the entire document as one chunk. For RAG applications, use variable chunking:Table Output Format
Control how tables appear in the output:Agentic Mode
Use LLM to review and correct parsing output:Figure Summaries
Generate descriptions for charts and images:Page Range
Process only specific pages:Filter Blocks
Remove specific content types from output:Return Images
Get image URLs for figures and tables:Response Structure
TheParseResponse object contains:
Chunks
Each chunk contains:content(str): Full text content formatted as Markdownembed(str): Content optimized for embeddingsblocks(list[Block]): Individual elements with positions
Blocks
Each block contains:type(str): Element type (Title,Header,Text,Table,Figure, etc.)content(str): The block’s contentbbox(BoundingBox): Position on the page (normalized 0-1 coordinates)confidence(str): Confidence level ("high"or"low")
URL Results
For large documents, the response may return a URL instead of inline content:Advanced Features
Raw Response Access
Access raw HTTP response data:Streaming Response
Stream large responses:Per-Request Options
Override client settings for this request:Error Handling
Complete Example
Best Practices
Use Variable Chunking for RAG
Enable
chunk_mode: "variable" for RAG pipelines to get semantically meaningful chunks.Enable Agentic for Scanned Docs
Use
agentic: [{"scope": "text"}] for scanned documents or poor-quality PDFs.Filter Headers/Footers
Use
filter_blocks to remove headers and footers that pollute search results.Handle URL Results
Always check
result.type and handle URL results for large documents.Next Steps
- Learn about extracting specific fields from parsed documents
- Explore response format details for complete structure
- Check out best practices for optimization
- See the async client for concurrent processing