parse.run() method converts documents into structured JSON with text, tables, and figures. It runs OCR, detects document layout, and returns content organized into chunks optimized for LLM and RAG workflows.
Basic Usage
Method Signatures
Synchronous Parse
Asynchronous Parse
runJob method returns a job_id that you can use with client.job.get() to retrieve results.
Input Options
Theinput parameter accepts several formats:
Configuration Examples
Chunking
By default, Parse returns the entire document as one chunk. For RAG applications, use variable chunking:Table Output Format
Control how tables appear in the output:Agentic Mode
Use LLM to review and correct parsing output:Figure Summaries
Generate descriptions for charts and images:Page Range
Process only specific pages:Filter Blocks
Remove specific content types from output:Response Structure
TheParseResponse object contains:
URL Results
For large documents, the response may return a URL instead of inline content:Error Handling
Complete Example
Best Practices
Use Variable Chunking for RAG
Enable
chunk_mode: "variable" for RAG pipelines to get semantically meaningful chunks.Enable Agentic for Scanned Docs
Use
agentic: [{ scope: "text" }] for scanned documents or poor-quality PDFs.Filter Headers/Footers
Use
filter_blocks to remove headers and footers that pollute search results.Handle URL Results
Always check
result.type and handle URL results for large documents.Next Steps
- Learn about extracting specific fields from parsed documents
- Explore response format details for complete structure
- Check out best practices for optimization