Documentation Index Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
Use this file to discover all available pages before exploring further.
The parse.run() method converts documents into structured JSON with text, tables, and figures. It runs OCR, detects document layout, and returns content organized into chunks optimized for LLM and RAG workflows.
Basic Usage
import Reducto from 'reductoai' ;
import fs from 'fs' ;
const client = new Reducto ();
// Upload and parse
const upload = await client . upload ({
file: fs . createReadStream ( "invoice.pdf" )
});
const result = await client . parse . run ({ input: upload . file_id });
// Access the results
for ( const chunk of result . result . chunks ) {
console . log ( chunk . content );
for ( const block of chunk . blocks ) {
console . log ( ` ${ block . type } on page ${ block . bbox . page } ` );
}
}
Method Signatures
Synchronous Parse
parse . run ( params : {
input: string | Upload ;
enhance ?: Enhance ;
formatting ?: Formatting ;
retrieval ?: Retrieval ;
settings ?: Settings ;
spreadsheet ?: Spreadsheet ;
}, options ?: RequestOptions ): Promise < ParseRunResponse >
Asynchronous Parse
parse . runJob ( params : {
input: string | Upload ;
async ?: ConfigV3AsyncConfig ;
enhance ?: Enhance ;
formatting ?: Formatting ;
retrieval ?: Retrieval ;
settings ?: Settings ;
spreadsheet ?: Spreadsheet ;
}, options ?: RequestOptions ): Promise < ParseRunJobResponse >
The runJob method returns a job_id that you can use with client.job.retrieve() to retrieve results.
The input parameter accepts several formats:
// From upload
const result = await client . parse . run ({ input: upload . file_id });
// Public URL
const result = await client . parse . run ({ input: "https://example.com/doc.pdf" });
// Presigned S3 URL
const result = await client . parse . run ({
input: "https://bucket.s3.amazonaws.com/doc.pdf?X-Amz-..."
});
// Reprocess previous job
const result = await client . parse . run ({
input: "jobid://7600c8c5-a52f-49d2-8a7d-d75d1b51e141"
});
Configuration Examples
Chunking
By default, Parse returns the entire document as one chunk. For RAG applications, use variable chunking:
const result = await client . parse . run ({
input: upload . file_id ,
retrieval: {
chunking: {
chunk_mode: "variable" // Options: "disabled", "variable", "page", "section"
}
}
});
Control how tables appear in the output:
const result = await client . parse . run ({
input: upload . file_id ,
formatting: {
table_output_format: "html" // Options: "dynamic", "html", "md", "json", "csv"
}
});
Agentic Mode
Use LLM to review and correct parsing output:
const result = await client . parse . run ({
input: upload . file_id ,
enhance: {
agentic: [
{ scope: "text" }, // For OCR correction
{ scope: "table" }, // For table structure fixes
{ scope: "figure" } // For chart extraction
]
}
});
Generate descriptions for charts and images:
const result = await client . parse . run ({
input: upload . file_id ,
enhance: {
summarize_figures: true
}
});
Page Range
Process only specific pages:
const result = await client . parse . run ({
input: upload . file_id ,
settings: {
page_range: {
start: 1 ,
end: 10
}
}
});
Filter Blocks
Remove specific content types from output:
const result = await client . parse . run ({
input: upload . file_id ,
retrieval: {
filter_blocks: [ "Header" , "Footer" , "Page Number" ]
}
});
Response Structure
The ParseResponse object contains:
const result = await client . parse . run ({ input: upload . file_id });
// Top-level fields
console . log ( result . job_id ); // string: Unique job identifier
console . log ( result . duration ); // number: Processing time in seconds
console . log ( result . studio_link ); // string: Link to view in Studio
// Usage information
console . log ( result . usage . num_pages ); // number: Pages processed
console . log ( result . usage . credits ); // number: Credits consumed
// Result content
if ( result . result . type === "full" ) {
const chunks = result . result . chunks ;
for ( const chunk of chunks ) {
console . log ( chunk . content ); // string: Full text content
console . log ( chunk . embed ); // string: Embedding-optimized content
console . log ( chunk . blocks ); // Array: Individual elements
}
}
URL Results
For large documents, the response may return a URL instead of inline content:
const result = await client . parse . run ({ input: upload . file_id });
if ( result . result . type === "url" ) {
// Fetch the content from the URL
const response = await fetch ( result . result . url );
const chunks = await response . json ();
} else {
// Content is inline
const chunks = result . result . chunks ;
}
Error Handling
import Reducto , { APIError , BadRequestError } from 'reductoai' ;
try {
const result = await client . parse . run ({ input: upload . file_id });
} catch ( error ) {
if ( error instanceof BadRequestError ) {
console . error ( `Invalid request: ${ error . status } - ${ error . message } ` );
} else if ( error instanceof APIError ) {
console . error ( `API error: ${ error . status } - ${ error . message } ` );
}
}
Complete Example
import Reducto from 'reductoai' ;
import fs from 'fs' ;
const client = new Reducto ();
// Upload
const upload = await client . upload ({
file: fs . createReadStream ( "fidelity-example.pdf" )
});
// Parse with configuration
const result = await client . parse . run ({
input: upload . file_id ,
enhance: {
agentic: [{ scope: "table" }],
summarize_figures: true
},
formatting: {
table_output_format: "html"
},
retrieval: {
chunking: { chunk_mode: "variable" }
},
settings: {
page_range: { start: 1 , end: 5 }
}
});
// Process results
console . log ( `Processed ${ result . usage . num_pages } pages` );
console . log ( `Used ${ result . usage . credits } credits` );
console . log ( `View in Studio: ${ result . studio_link } ` );
for ( let i = 0 ; i < result . result . chunks . length ; i ++ ) {
const chunk = result . result . chunks [ i ];
console . log ( ` \n === Chunk ${ i + 1 } ===` );
console . log ( chunk . content . substring ( 0 , 500 )); // First 500 chars
// Count block types
const blockTypes = {};
for ( const block of chunk . blocks ) {
blockTypes [ block . type ] = ( blockTypes [ block . type ] || 0 ) + 1 ;
}
console . log ( `Block types:` , blockTypes );
}
Best Practices
Use Variable Chunking for RAG Enable chunk_mode: "variable" for RAG pipelines to get semantically meaningful chunks.
Enable Agentic for Scanned Docs Use agentic: [{ scope: "text" }] for scanned documents or poor-quality PDFs.
Filter Headers/Footers Use filter_blocks to remove headers and footers that pollute search results.
Handle URL Results Always check result.type and handle URL results for large documents.
Next Steps