Skip to main content
The extract.run() method pulls specific fields from documents as structured JSON. You define a JSON schema with the fields you need, and Extract returns values matching that schema.

Basic Usage

import Reducto from 'reductoai';
import fs from 'fs';

const client = new Reducto();

// Upload
const upload = await client.upload({ 
  file: fs.createReadStream("invoice.pdf") 
});

// Extract with schema
const result = await client.extract.run({
  input: upload.file_id,
  instructions: {
    schema: {
      type: "object",
      properties: {
        invoice_number: {
          type: "string",
          description: "The invoice number, typically at the top"
        },
        total: {
          type: "number",
          description: "The total amount due"
        },
        date: {
          type: "string",
          description: "Invoice date"
        }
      }
    }
  }
});

// Access extracted values
console.log(result.result[0].invoice_number);
console.log(result.result[0].total);

Method Signatures

Synchronous Extract

extract.run(params: {
  input: string | Upload;
  instructions?: {
    schema?: unknown;
    system_prompt?: string;
  };
  parsing?: ParseOptions;
  settings?: {
    array_extract?: boolean;
    citations?: {
      enabled?: boolean;
      numerical_confidence?: boolean;
    };
    include_images?: boolean;
    optimize_for_latency?: boolean;
  };
}, options?: RequestOptions): Promise<ExtractRunResponse>

Asynchronous Extract

extract.runJob(params: {
  input: string | Upload;
  async?: ConfigV3AsyncConfig;
  instructions?: {
    schema?: unknown;
    system_prompt?: string;
  };
  parsing?: ParseOptions;
  settings?: {
    array_extract?: boolean;
    citations?: {
      enabled?: boolean;
      numerical_confidence?: boolean;
    };
    include_images?: boolean;
    optimize_for_latency?: boolean;
  };
}, options?: RequestOptions): Promise<ExtractRunJobResponse>
The runJob method returns a job_id that you can use with client.job.get() to retrieve results.

Schema Definition

The instructions parameter requires a schema field with a JSON schema:
const schema = {
  type: "object",
  properties: {
    field_name: {
      type: "string",  // or "number", "boolean", "array", "object"
      description: "Clear description of what to extract"
    }
  }
};

const result = await client.extract.run({
  input: upload.file_id,
  instructions: { schema }
});

Array Extraction

For documents with repeating data (line items, transactions), enable array extraction:
const result = await client.extract.run({
  input: upload.file_id,
  instructions: {
    schema: {
      type: "object",
      properties: {
        line_items: {
          type: "array",
          items: {
            type: "object",
            properties: {
              description: { type: "string" },
              quantity: { type: "number" },
              price: { type: "number" }
            }
          }
        }
      }
    }
  },
  settings: {
    array_extract: true
  }
});

Citations

Enable citations to get source locations for each extracted value:
const result = await client.extract.run({
  input: upload.file_id,
  instructions: { schema },
  settings: {
    citations: {
      enabled: true,
      numerical_confidence: true  // 0-1 confidence score
    }
  }
});

// With citations enabled, values are wrapped
const field = result.result[0].total_amount;
console.log(`Value: ${field.value}`);
console.log(`Found on page ${field.citations[0].bbox.page}`);
console.log(`Confidence: ${field.citations[0].confidence}`);

Complete Example

import Reducto from 'reductoai';
import fs from 'fs';

const client = new Reducto();

// Upload
const upload = await client.upload({ 
  file: fs.createReadStream("financial-statement.pdf") 
});

// Define schema
const schema = {
  type: "object",
  properties: {
    portfolio_value: {
      type: "number",
      description: "Total portfolio value at the end of the period"
    },
    total_income_ytd: {
      type: "number",
      description: "Total income year-to-date"
    },
    top_holdings: {
      type: "array",
      items: { type: "string" },
      description: "Names of the top 5 holdings"
    }
  }
};

// Extract with configuration
const result = await client.extract.run({
  input: upload.file_id,
  instructions: {
    schema,
    system_prompt: "Extract financial data from this investment statement."
  },
  settings: {
    citations: { enabled: true },
    array_extract: true  // For top_holdings array
  },
  parsing: {
    enhance: {
      agentic: [{ scope: "table" }]  // Better table extraction
    }
  }
});

// Process results
console.log(`Extracted ${result.result.length} results`);
console.log(`Used ${result.usage.credits} credits`);

for (let i = 0; i < result.result.length; i++) {
  const extracted = result.result[i];
  console.log(`\n=== Result ${i + 1} ===`);
  console.log(`Portfolio Value: $${extracted.portfolio_value.toLocaleString()}`);
  console.log(`Total Income YTD: $${extracted.total_income_ytd.toLocaleString()}`);
  console.log(`Top Holdings: ${extracted.top_holdings.join(', ')}`);
}

Next Steps