Extract

The extract.run() method pulls specific fields from documents as structured JSON. You define a JSON schema with the fields you need, and Extract returns values matching that schema.

Basic Usage

import Reducto from 'reductoai';
import fs from 'fs';

const client = new Reducto();

// Upload
const upload = await client.upload({ 
  file: fs.createReadStream("invoice.pdf") 
});

// Extract with schema
const result = await client.extract.run({
  input: upload.file_id,
  instructions: {
    schema: {
      type: "object",
      properties: {
        invoice_number: {
          type: "string",
          description: "The invoice number, typically at the top"
        },
        total: {
          type: "number",
          description: "The total amount due"
        },
        date: {
          type: "string",
          description: "Invoice date"
        }
      }
    }
  }
});

// Access extracted values
console.log(result.result[0].invoice_number);
console.log(result.result[0].total);

Method Signatures

Synchronous Extract

extract.run(params: {
  input: string | Upload;
  instructions?: {
    schema?: unknown;
    system_prompt?: string;
  };
  parsing?: ParseOptions;
  settings?: {
    array_extract?: boolean;
    citations?: {
      enabled?: boolean;
      numerical_confidence?: boolean;
    };
    include_images?: boolean;
    optimize_for_latency?: boolean;
  };
}, options?: RequestOptions): Promise<ExtractRunResponse>

Asynchronous Extract

extract.runJob(params: {
  input: string | Upload;
  async?: ConfigV3AsyncConfig;
  instructions?: {
    schema?: unknown;
    system_prompt?: string;
  };
  parsing?: ParseOptions;
  settings?: {
    array_extract?: boolean;
    citations?: {
      enabled?: boolean;
      numerical_confidence?: boolean;
    };
    include_images?: boolean;
    optimize_for_latency?: boolean;
  };
}, options?: RequestOptions): Promise<ExtractRunJobResponse>

The runJob method returns a job_id that you can use with client.job.get() to retrieve results.

Schema Definition

The instructions parameter requires a schema field with a JSON schema:

const schema = {
  type: "object",
  properties: {
    field_name: {
      type: "string",  // or "number", "boolean", "array", "object"
      description: "Clear description of what to extract"
    }
  }
};

const result = await client.extract.run({
  input: upload.file_id,
  instructions: { schema }
});

Array Extraction

For documents with repeating data (line items, transactions), enable array extraction:

const result = await client.extract.run({
  input: upload.file_id,
  instructions: {
    schema: {
      type: "object",
      properties: {
        line_items: {
          type: "array",
          items: {
            type: "object",
            properties: {
              description: { type: "string" },
              quantity: { type: "number" },
              price: { type: "number" }
            }
          }
        }
      }
    }
  },
  settings: {
    array_extract: true
  }
});

Citations

Enable citations to get source locations for each extracted value:

const result = await client.extract.run({
  input: upload.file_id,
  instructions: { schema },
  settings: {
    citations: {
      enabled: true,
      numerical_confidence: true  // 0-1 confidence score
    }
  }
});

// With citations enabled, values are wrapped
const field = result.result[0].total_amount;
console.log(`Value: ${field.value}`);
console.log(`Found on page ${field.citations[0].bbox.page}`);
console.log(`Confidence: ${field.citations[0].confidence}`);

Complete Example

import Reducto from 'reductoai';
import fs from 'fs';

const client = new Reducto();

// Upload
const upload = await client.upload({ 
  file: fs.createReadStream("financial-statement.pdf") 
});

// Define schema
const schema = {
  type: "object",
  properties: {
    portfolio_value: {
      type: "number",
      description: "Total portfolio value at the end of the period"
    },
    total_income_ytd: {
      type: "number",
      description: "Total income year-to-date"
    },
    top_holdings: {
      type: "array",
      items: { type: "string" },
      description: "Names of the top 5 holdings"
    }
  }
};

// Extract with configuration
const result = await client.extract.run({
  input: upload.file_id,
  instructions: {
    schema,
    system_prompt: "Extract financial data from this investment statement."
  },
  settings: {
    citations: { enabled: true },
    array_extract: true  // For top_holdings array
  },
  parsing: {
    enhance: {
      agentic: [{ scope: "table" }]  // Better table extraction
    }
  }
});

// Process results
console.log(`Extracted ${result.result.length} results`);
console.log(`Used ${result.usage.credits} credits`);

for (let i = 0; i < result.result.length; i++) {
  const extracted = result.result[i];
  console.log(`\n=== Result ${i + 1} ===`);
  console.log(`Portfolio Value: $${extracted.portfolio_value.toLocaleString()}`);
  console.log(`Total Income YTD: $${extracted.total_income_ytd.toLocaleString()}`);
  console.log(`Top Holdings: ${extracted.top_holdings.join(', ')}`);
}

Next Steps

Learn about schema design best practices
Explore array extraction for long documents
Check out citations for source verification

Get Started

Core Methods

Utilities

Basic Usage

Method Signatures

Synchronous Extract

Asynchronous Extract

Schema Definition

Array Extraction

Citations

Complete Example

Next Steps

Get Started

Core Methods

Utilities

​Basic Usage

​Method Signatures

​Synchronous Extract

​Asynchronous Extract

​Schema Definition

​Array Extraction

​Citations

​Complete Example

​Next Steps

Basic Usage

Method Signatures

Synchronous Extract

Asynchronous Extract

Schema Definition

Array Extraction

Citations

Complete Example

Next Steps