Parse

The parse.run() method converts documents into structured JSON with text, tables, and figures. It runs OCR, detects document layout, and returns content organized into chunks optimized for LLM and RAG workflows.

Basic Usage

import Reducto from 'reductoai';
import fs from 'fs';

const client = new Reducto();

// Upload and parse
const upload = await client.upload({ 
  file: fs.createReadStream("invoice.pdf") 
});
const result = await client.parse.run({ input: upload.file_id });

// Access the results
for (const chunk of result.result.chunks) {
  console.log(chunk.content);
  for (const block of chunk.blocks) {
    console.log(`  ${block.type} on page ${block.bbox.page}`);
  }
}

Method Signatures

Synchronous Parse

parse.run(params: {
  input: string | Upload;
  enhance?: Enhance;
  formatting?: Formatting;
  retrieval?: Retrieval;
  settings?: Settings;
  spreadsheet?: Spreadsheet;
}, options?: RequestOptions): Promise<ParseRunResponse>

Asynchronous Parse

parse.runJob(params: {
  input: string | Upload;
  async?: ConfigV3AsyncConfig;
  enhance?: Enhance;
  formatting?: Formatting;
  retrieval?: Retrieval;
  settings?: Settings;
  spreadsheet?: Spreadsheet;
}, options?: RequestOptions): Promise<ParseRunJobResponse>

The runJob method returns a job_id that you can use with client.job.get() to retrieve results.

Input Options

The input parameter accepts several formats:

// From upload
const result = await client.parse.run({ input: upload.file_id });

// Public URL
const result = await client.parse.run({ input: "https://example.com/doc.pdf" });

// Presigned S3 URL
const result = await client.parse.run({ 
  input: "https://bucket.s3.amazonaws.com/doc.pdf?X-Amz-..." 
});

// Reprocess previous job
const result = await client.parse.run({ 
  input: "jobid://7600c8c5-a52f-49d2-8a7d-d75d1b51e141" 
});

Configuration Examples

Chunking

By default, Parse returns the entire document as one chunk. For RAG applications, use variable chunking:

const result = await client.parse.run({
  input: upload.file_id,
  retrieval: {
    chunking: {
      chunk_mode: "variable"  // Options: "disabled", "variable", "page", "section"
    }
  }
});

Table Output Format

Control how tables appear in the output:

const result = await client.parse.run({
  input: upload.file_id,
  formatting: {
    table_output_format: "html"  // Options: "dynamic", "html", "md", "json", "csv"
  }
});

Agentic Mode

Use LLM to review and correct parsing output:

const result = await client.parse.run({
  input: upload.file_id,
  enhance: {
    agentic: [
      { scope: "text" },      // For OCR correction
      { scope: "table" },     // For table structure fixes
      { scope: "figure" }     // For chart extraction
    ]
  }
});

Figure Summaries

Generate descriptions for charts and images:

const result = await client.parse.run({
  input: upload.file_id,
  enhance: {
    summarize_figures: true
  }
});

Page Range

Process only specific pages:

const result = await client.parse.run({
  input: upload.file_id,
  settings: {
    page_range: {
      start: 1,
      end: 10
    }
  }
});

Filter Blocks

Remove specific content types from output:

const result = await client.parse.run({
  input: upload.file_id,
  retrieval: {
    filter_blocks: ["Header", "Footer", "Page Number"]
  }
});

Response Structure

The ParseResponse object contains:

const result = await client.parse.run({ input: upload.file_id });

// Top-level fields
console.log(result.job_id);          // string: Unique job identifier
console.log(result.duration);        // number: Processing time in seconds
console.log(result.studio_link);     // string: Link to view in Studio

// Usage information
console.log(result.usage.num_pages);  // number: Pages processed
console.log(result.usage.credits);    // number: Credits consumed

// Result content
if (result.result.type === "full") {
  const chunks = result.result.chunks;
  for (const chunk of chunks) {
    console.log(chunk.content);    // string: Full text content
    console.log(chunk.embed);       // string: Embedding-optimized content
    console.log(chunk.blocks);      // Array: Individual elements
  }
}

URL Results

For large documents, the response may return a URL instead of inline content:

const result = await client.parse.run({ input: upload.file_id });

if (result.result.type === "url") {
  // Fetch the content from the URL
  const response = await fetch(result.result.url);
  const chunks = await response.json();
} else {
  // Content is inline
  const chunks = result.result.chunks;
}

Error Handling

import Reducto, { APIError, BadRequestError } from 'reductoai';

try {
  const result = await client.parse.run({ input: upload.file_id });
} catch (error) {
  if (error instanceof BadRequestError) {
    console.error(`Invalid request: ${error.status} - ${error.message}`);
  } else if (error instanceof APIError) {
    console.error(`API error: ${error.status} - ${error.message}`);
  }
}

Complete Example

import Reducto from 'reductoai';
import fs from 'fs';

const client = new Reducto();

// Upload
const upload = await client.upload({ 
  file: fs.createReadStream("financial-statement.pdf") 
});

// Parse with configuration
const result = await client.parse.run({
  input: upload.file_id,
  enhance: {
    agentic: [{ scope: "table" }],
    summarize_figures: true
  },
  formatting: {
    table_output_format: "html"
  },
  retrieval: {
    chunking: { chunk_mode: "variable" }
  },
  settings: {
    page_range: { start: 1, end: 5 }
  }
});

// Process results
console.log(`Processed ${result.usage.num_pages} pages`);
console.log(`Used ${result.usage.credits} credits`);
console.log(`View in Studio: ${result.studio_link}`);

for (let i = 0; i < result.result.chunks.length; i++) {
  const chunk = result.result.chunks[i];
  console.log(`\n=== Chunk ${i + 1} ===`);
  console.log(chunk.content.substring(0, 500));  // First 500 chars
  
  // Count block types
  const blockTypes = {};
  for (const block of chunk.blocks) {
    blockTypes[block.type] = (blockTypes[block.type] || 0) + 1;
  }
  
  console.log(`Block types:`, blockTypes);
}

Best Practices

Use Variable Chunking for RAG

Enable chunk_mode: "variable" for RAG pipelines to get semantically meaningful chunks.

Enable Agentic for Scanned Docs

Use agentic: [{ scope: "text" }] for scanned documents or poor-quality PDFs.

Filter Headers/Footers

Use filter_blocks to remove headers and footers that pollute search results.

Handle URL Results

Always check result.type and handle URL results for large documents.

Next Steps

Learn about extracting specific fields from parsed documents
Explore response format details for complete structure
Check out best practices for optimization

Get Started

Core Methods

Utilities

Basic Usage

Method Signatures

Synchronous Parse

Asynchronous Parse

Input Options

Configuration Examples

Chunking

Table Output Format

Agentic Mode

Figure Summaries

Page Range

Filter Blocks

Response Structure

URL Results

Error Handling

Complete Example

Best Practices

Use Variable Chunking for RAG

Enable Agentic for Scanned Docs

Filter Headers/Footers

Handle URL Results

Next Steps

Get Started

Core Methods

Utilities

​Basic Usage

​Method Signatures

​Synchronous Parse

​Asynchronous Parse

​Input Options

​Configuration Examples

​Chunking

​Table Output Format

​Agentic Mode

​Figure Summaries

​Page Range

​Filter Blocks

​Response Structure

​URL Results

​Error Handling

​Complete Example

​Best Practices

Use Variable Chunking for RAG

Enable Agentic for Scanned Docs

Filter Headers/Footers

Handle URL Results

​Next Steps

Basic Usage

Method Signatures

Synchronous Parse

Asynchronous Parse

Input Options

Configuration Examples

Chunking

Table Output Format

Agentic Mode

Figure Summaries

Page Range

Filter Blocks

Response Structure

URL Results

Error Handling

Complete Example

Best Practices

Next Steps