
Parse pipeline in Studio showing a financial statement with detected regions
What Parse extracts
Parse breaks your document into chunks, each representing a semantic unit of content:- Text blocks: Paragraphs and body text, preserving reading order across columns
- Tables: Structured data with rows and columns, output as markdown, HTML, or JSON
- Figures: Images, charts, and diagrams with optional AI-generated descriptions
- Headers: Section titles with hierarchy levels for document structure
- Key-value pairs: Form-like content where a label maps to a value
- Footers: Page numbers, disclaimers, and repeated bottom-of-page content

Bounding boxes showing detected content types
When to adjust settings
The default configuration handles most documents well. The Configurations tab offers two modes:
Simple mode exposes the most common settings
- Contains Handwritten Text: Routes through OCR with AI enhancement
- Enable AI Summarization: Generates descriptions of figures and charts
- Return Figure/Table Images: Includes extracted images as URLs

Advanced mode exposes the full API configuration surface
variable with a target size around 500-1000 characters.
Formatting — Control output structure. Switch table format to html or json for programmatic use. Enable additional metadata like page numbers or confidence scores.
Spreadsheet — Handle Excel and CSV files. Control multi-sheet behavior and whether to include sheet names in output.
Settings — Core processing controls. Set extraction mode to ocr for scanned documents, specify page ranges to process only relevant sections.
See Parse Configurations for the complete reference.
Working with results
The Results tab shows parsed output as formatted markdown by default. The toolbar offers several options:- Copy — Copy the output to your clipboard
- Download — Save results as a file
- JSON — Toggle to see the raw API response structure