> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Multimodal RAG with Image Results

> Build a RAG system that retrieves and reasons over both text and images from documents

Standard RAG extracts text and misses charts, figures, and tables. Multimodal RAG indexes both text and images, then passes relevant visuals to a vision LLM at query time.

```
PDF → Reducto Parse → S3 (images) → Pinecone (embeddings) → Vision LLM
```

This cookbook builds a pipeline that answers questions using both text context and document images.

***

## Create Reducto API Key

<Steps>
  <Step title="Open Studio">
    Go to [studio.reducto.ai](https://studio.reducto.ai) and sign in. From the home page, click **API Keys** in the left sidebar.

    <Frame>
      <img src="https://mintcdn.com/reducto/9Avr4qdsIoNo7JLQ/cookbooks/dummy-docs/screenshots/api-1.png?fit=max&auto=format&n=9Avr4qdsIoNo7JLQ&q=85&s=6fda1435e042681807741c7743273da2" alt="Studio home page with API Keys in sidebar" width="3164" height="1922" data-path="cookbooks/dummy-docs/screenshots/api-1.png" />
    </Frame>
  </Step>

  <Step title="View API Keys">
    The API Keys page shows your existing keys. Click **+ Create new API key** in the top right corner.

    <Frame>
      <img src="https://mintcdn.com/reducto/9Avr4qdsIoNo7JLQ/cookbooks/dummy-docs/screenshots/api-2.png?fit=max&auto=format&n=9Avr4qdsIoNo7JLQ&q=85&s=10db7406c2ac7217e4b1d75e028b58e1" alt="API Keys page with Create button" width="3164" height="1922" data-path="cookbooks/dummy-docs/screenshots/api-2.png" />
    </Frame>
  </Step>

  <Step title="Configure Key">
    In the modal, enter a name for your key and set an expiration policy (or select "Never" for no expiration). Click **Create**.

    <Frame>
      <img src="https://mintcdn.com/reducto/9Avr4qdsIoNo7JLQ/cookbooks/dummy-docs/screenshots/api-3.png?fit=max&auto=format&n=9Avr4qdsIoNo7JLQ&q=85&s=afb60f6cfb4d33940669d534dd007343" alt="New API Key modal with name and expiration fields" width="3164" height="1922" data-path="cookbooks/dummy-docs/screenshots/api-3.png" />
    </Frame>
  </Step>

  <Step title="Copy Your Key">
    Copy your new API key and store it securely. You won't be able to see it again after closing this dialog.

    <Frame>
      <img src="https://mintcdn.com/reducto/9Avr4qdsIoNo7JLQ/cookbooks/dummy-docs/screenshots/api-4.png?fit=max&auto=format&n=9Avr4qdsIoNo7JLQ&q=85&s=c861b1c2f593244957cf15c6fd717f60" alt="Copy API key dialog" width="3164" height="1922" data-path="cookbooks/dummy-docs/screenshots/api-4.png" />
    </Frame>

    Set the key as an environment variable:

    ```bash theme={null}
    export REDUCTO_API_KEY="your-api-key-here"
    ```
  </Step>
</Steps>

***

## Prerequisites

You'll also need accounts and API keys from these services:

| Service  | Purpose                             | Sign up                                  |
| -------- | ----------------------------------- | ---------------------------------------- |
| AWS S3   | Permanent image storage             | [aws.amazon.com](https://aws.amazon.com) |
| Pinecone | Vector database for semantic search | [pinecone.io](https://www.pinecone.io)   |
| VoyageAI | Text embeddings                     | [voyageai.com](https://www.voyageai.com) |

<Warning>
  VoyageAI's free tier has a rate limit of 3 requests per minute. For production use, add delays between embedding calls or upgrade to a paid plan.
</Warning>

You'll also need a vision-capable LLM (Claude, GPT-4V, Gemini, etc.) for the final generation step.

Install the required packages:

<CodeGroup>
  ```bash Python theme={null}
  pip install reductoai pinecone voyageai requests boto3
  ```

  ```bash JavaScript theme={null}
  npm install reductoai @aws-sdk/client-s3 @pinecone-database/pinecone voyageai
  ```
</CodeGroup>

***

## Setup: AWS S3 bucket

We need an S3 bucket to store extracted images permanently. Reducto's image URLs expire after 1 hour for security reasons. If we stored those URLs directly in our vector database, they'd be broken by the time someone queries the system tomorrow.

S3 gives us permanent, publicly-accessible URLs that work indefinitely.

<Steps>
  <Step title="Create an IAM user">
    Never use your AWS root account for applications. Instead, create a dedicated IAM user with limited permissions.

    Go to [IAM Console](https://console.aws.amazon.com/iam) → **Users** → **Create user**.

    <img src="https://mintcdn.com/reducto/9Avr4qdsIoNo7JLQ/images/aws-create-user.png?fit=max&auto=format&n=9Avr4qdsIoNo7JLQ&q=85&s=5682b3a0893f6993802f764a950d17d3" alt="create aws user reducto demo" width="2844" height="1154" data-path="images/aws-create-user.png" />
  </Step>

  <Step title="Attach S3 permissions">
    Select **Attach policies directly**, search for `AmazonS3FullAccess`, check it, and click **Create user**.

    This gives the user permission to read and write to S3 buckets, but nothing else in your AWS account.
  </Step>

  <Step title="Create access keys">
    Click on your new user → **Security credentials** → **Create access key**.

    Select "Command Line Interface (CLI)", confirm, and create.

    **Save both keys now** - you won't see the secret again. You'll need these for the `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY` environment variables.
  </Step>

  <Step title="Create an S3 bucket">
    Go to [S3 Console](https://s3.console.aws.amazon.com) → **Create bucket**.

    <img src="https://mintcdn.com/reducto/9Avr4qdsIoNo7JLQ/images/s3-make-bucket.png?fit=max&auto=format&n=9Avr4qdsIoNo7JLQ&q=85&s=5a81837ccd0313df2f71ed808e667f96" alt="make bucket" width="1254" height="966" data-path="images/s3-make-bucket.png" />

    Choose a unique name (e.g., `my-multimodal-rag-images`) and select a region close to you for lower latency.
  </Step>

  <Step title="Enable public access">
    By default, S3 buckets are private. We need to make objects publicly readable so your application can fetch images at query time.

    In your bucket → **Permissions** tab:

    **First**, disable the block public access settings:

    * Click **Block public access** → Edit → Uncheck all 4 boxes → Save

          <img src="https://mintcdn.com/reducto/9Avr4qdsIoNo7JLQ/images/aws-block-public-access.png?fit=max&auto=format&n=9Avr4qdsIoNo7JLQ&q=85&s=52a6f164814ddeb34cd06ce7d04bc142" alt="aws uncheck public access" width="1781" height="548" data-path="images/aws-block-public-access.png" />

    **Then**, add a bucket policy that allows anyone to read objects:

    * Click **Bucket policy** → Edit → Paste this policy:

    ```json theme={null}
    {
      "Version": "2012-10-17",
      "Statement": [{
        "Effect": "Allow",
        "Principal": "*",
        "Action": "s3:GetObject",
        "Resource": "arn:aws:s3:::YOUR-BUCKET-NAME/*"
      }]
    }
    ```

    Replace `YOUR-BUCKET-NAME` with your actual bucket name.
  </Step>
</Steps>

***

## Setup: Pinecone index

We need a vector index to store text embeddings. Each vector will also include metadata with the S3 URL, so we can fetch the corresponding image at query time.

<Steps>
  <Step title="Create index">
    <img src="https://mintcdn.com/reducto/9Avr4qdsIoNo7JLQ/images/pinecone-index.png?fit=max&auto=format&n=9Avr4qdsIoNo7JLQ&q=85&s=b6bb6246e096b454c97b9fe80c108b15" alt="make pinecone index" width="2858" height="1016" data-path="images/pinecone-index.png" />

    Go to [Pinecone Console](https://app.pinecone.io) → **Create Index**.

    * **Name**: `multimodal-rag`
    * **Dimensions**: `1024` (this must match VoyageAI's voyage-3 model)
    * **Metric**: `cosine`

    The dimension setting is critical. VoyageAI's voyage-3 model outputs 1024-dimensional vectors. If you use a different embedding model, check its output dimensions.
  </Step>
</Steps>

***

## Set environment variables

Before running any code, set these environment variables in your terminal:

```bash theme={null}
export REDUCTO_API_KEY="your-reducto-key"
export PINECONE_API_KEY="your-pinecone-key"
export VOYAGEAI_API_KEY="your-voyageai-key"
export AWS_ACCESS_KEY_ID="your-aws-access-key"
export AWS_SECRET_ACCESS_KEY="your-aws-secret-key"
export S3_BUCKET_NAME="your-bucket-name"
```

***

## Sample document

For this cookbook, we'll use an open-access research paper from PubMed Central about hepatitis B treatment. This paper has 16 pages with multiple figures, charts, and data tables, making it ideal for demonstrating multimodal RAG.

<img src="https://mintcdn.com/reducto/9Avr4qdsIoNo7JLQ/images/multimodal-pdf-sample.png?fit=max&auto=format&n=9Avr4qdsIoNo7JLQ&q=85&s=780673074247d881841ec7f63338a7f8" alt="pdf sample used" width="2537" height="1298" data-path="images/multimodal-pdf-sample.png" />

**Download the sample PDF:**

```
https://ftp.ncbi.nlm.nih.gov/pub/pmc/oa_pdf/14/14/WJG-30-1911.PMC11036500.pdf
```

Save it as `research_paper.pdf` in your working directory.

<Tip>
  You can use any PDF with charts or figures. Annual reports, scientific papers, and technical documentation work well for multimodal RAG.
</Tip>

***

## Step 1: Parse the document with image extraction

### Initialize the client

First, we create a Reducto client using our API key:

<CodeGroup>
  ```python Python theme={null}
  import os
  from reducto import Reducto

  client = Reducto(api_key=os.environ["REDUCTO_API_KEY"])
  ```

  ```javascript JavaScript theme={null}
  import Reducto from "reductoai";

  const client = new Reducto({ apiKey: process.env.REDUCTO_API_KEY });
  ```
</CodeGroup>

### Upload the document

Before parsing, we need to upload the file to Reducto's servers. The `upload()` method returns a file ID that we'll use in the parse call:

<CodeGroup>
  ```python Python theme={null}
  from pathlib import Path

  upload = client.upload(file=Path("research_paper.pdf"))

  print(f"File ID: {upload.file_id}")
  ```

  ```javascript JavaScript theme={null}
  import fs from "fs";

  const upload = await client.upload({
    file: fs.createReadStream("research_paper.pdf"),
  });

  console.log(`File ID: ${upload.file_id}`);
  ```
</CodeGroup>

```
File ID: reducto://c0584170-17a0-44f5-baf4-727467b71b84.pdf
```

### Parse with image extraction

Now we parse the document. The key settings here are:

* **`return_images`**: Tells Reducto to crop and return images for figures and tables
* **`chunk_mode`**: Controls how text is grouped into chunks

<CodeGroup>
  ```python Python theme={null}
  result = client.parse.run(
      input=upload.file_id,
      settings={
          "return_images": ["figure", "table"]
      },
      retrieval={
          "chunking": {
              "chunk_mode": "section"
          }
      }
  )

  print(f"Parsed {result.usage.num_pages} pages")
  print(f"Got {len(result.result.chunks)} chunks")
  ```

  ```javascript JavaScript theme={null}
  const result = await client.parse.run({
    input: upload.file_id,
    settings: {
      return_images: ["figure", "table"]
    },
    retrieval: {
      chunking: {
        chunk_mode: "section"
      }
    }
  });

  console.log(`Parsed ${result.usage.num_pages} pages`);
  console.log(`Got ${result.result.chunks.length} chunks`);
  ```
</CodeGroup>

```
Parsed 16 pages
Got 39 chunks
```

**Why `chunk_mode: "section"`?**

We use section-based chunking because it keeps figures together with their surrounding explanatory text. If a figure appears in the "Results" section, the chunk will include both the figure and the text that explains it. This improves retrieval quality because the embedding captures the full context.

Other options:

* `page`: One chunk per page. Simpler but may split related content.
* `variable`: Adaptive chunking based on content density.

***

## Step 2: Understand the response structure

Before we start uploading images, let's look at what Reducto actually returns. This helps us understand what data we're working with.

### Exploring chunks and blocks

Each chunk contains multiple blocks. A block can be text, a figure, a table, or other content types:

<CodeGroup>
  ```python Python theme={null}
  # Look at the first chunk
  chunk = result.result.chunks[0]
  print(f"Chunk has {len(chunk.blocks)} blocks")
  print(f"Embed text preview: {chunk.embed[:200]}...")
  ```

  ```javascript JavaScript theme={null}
  // Look at the first chunk
  const chunk = result.result.chunks[0];
  console.log(`Chunk has ${chunk.blocks.length} blocks`);
  console.log(`Embed text preview: ${chunk.embed.slice(0, 200)}...`);
  ```
</CodeGroup>

```
Chunk has 5 blocks
Embed text preview: # Inhibition of hepatitis B virus via selective apoptosis
modulation by Chinese patent medicine Liuweiwuling Tablet

## Core Tip
Liuweiwuling Tablet (LWWL) exhibits a selective pro-apoptotic effect...
```

### Finding figures and tables

Let's find all figures and tables in the document:

<CodeGroup>
  ```python Python theme={null}
  figure_count = 0
  table_count = 0

  for chunk in result.result.chunks:
      for block in chunk.blocks:
          if block.type == "Figure":
              figure_count += 1
          elif block.type == "Table":
              table_count += 1

  print(f"Found {figure_count} figures and {table_count} tables")
  ```

  ```javascript JavaScript theme={null}
  let figureCount = 0;
  let tableCount = 0;

  for (const chunk of result.result.chunks) {
    for (const block of chunk.blocks) {
      if (block.type === "Figure") {
        figureCount++;
      } else if (block.type === "Table") {
        tableCount++;
      }
    }
  }

  console.log(`Found ${figureCount} figures and ${tableCount} tables`);
  ```
</CodeGroup>

```
Found 39 figures and 1 table
```

### Examining a figure block

Each figure block has several important fields:

<CodeGroup>
  ```python Python theme={null}
  # Find the first figure
  for chunk in result.result.chunks:
      for block in chunk.blocks:
          if block.type == "Figure":
              print(f"Type: {block.type}")
              print(f"Page: {block.bbox.page}")
              print(f"Image URL: {block.image_url[:80]}...")
              print(f"Content: {block.content[:150]}...")
              break
      else:
          continue
      break
  ```

  ```javascript JavaScript theme={null}
  // Find the first figure
  outer: for (const chunk of result.result.chunks) {
    for (const block of chunk.blocks) {
      if (block.type === "Figure") {
        console.log(`Type: ${block.type}`);
        console.log(`Page: ${block.bbox.page}`);
        console.log(`Image URL: ${block.image_url.slice(0, 80)}...`);
        console.log(`Content: ${block.content.slice(0, 150)}...`);
        break outer;
      }
    }
  }
  ```
</CodeGroup>

```
Type: Figure
Page: 1
Image URL: https://prod-storage20241010144745140900000001.s3.amazonaws.com/6eb2fb19...
Content: - Title: World Journal of Gastroenterology
- Logo/abbreviation: "W J G" — three white script letters each inside a black square...
```

**Key fields explained:**

| Field             | What it contains                                                |
| ----------------- | --------------------------------------------------------------- |
| `block.type`      | "Figure", "Table", "Text", etc.                                 |
| `block.bbox.page` | Page number (1-indexed)                                         |
| `block.image_url` | Temporary URL to the cropped image. **Expires in 1 hour.**      |
| `block.content`   | AI-generated description of the figure                          |
| `chunk.embed`     | Full text optimized for embedding, includes figure descriptions |

<Note>
  The `content` field contains Reducto's AI-generated description of the figure. This is incredibly useful, as it means your vector search can find figures based on what they show, not just the surrounding text.
</Note>

***

## Step 3: Upload images to S3

Now we need to save these images permanently. Reducto's URLs expire in 1 hour, so we'll upload each image to S3 immediately.

### Initialize the S3 client

<CodeGroup>
  ```python Python theme={null}
  import boto3

  s3 = boto3.client("s3")
  bucket_name = os.environ["S3_BUCKET_NAME"]
  ```

  ```javascript JavaScript theme={null}
  import { S3Client, PutObjectCommand, GetBucketLocationCommand } from "@aws-sdk/client-s3";

  const s3 = new S3Client({ region: "us-east-1" }); // Will be updated after getting bucket region
  const bucketName = process.env.S3_BUCKET_NAME;
  ```
</CodeGroup>

### Determine the bucket region

S3 URLs include the region. If we get this wrong, the URLs won't work:

<CodeGroup>
  ```python Python theme={null}
  region = s3.get_bucket_location(Bucket=bucket_name).get("LocationConstraint")
  if region is None:
      region = "us-east-1"  # us-east-1 returns None for LocationConstraint

  print(f"Bucket region: {region}")
  ```

  ```javascript JavaScript theme={null}
  const locationResponse = await s3.send(
    new GetBucketLocationCommand({ Bucket: bucketName })
  );
  const region = locationResponse.LocationConstraint || "us-east-1";

  console.log(`Bucket region: ${region}`);
  ```
</CodeGroup>

```
Bucket region: ap-south-1
```

### Create the upload function

This function downloads an image from Reducto and uploads it to S3:

<CodeGroup>
  ```python Python theme={null}
  import requests

  def upload_image_to_s3(image_url, s3_key):
      """Download image from Reducto and upload to S3."""
      # Download from Reducto
      response = requests.get(image_url)
      response.raise_for_status()

      # Upload to S3
      s3.put_object(
          Bucket=bucket_name,
          Key=s3_key,
          Body=response.content,
          ContentType="image/png"
      )

      # Return the permanent S3 URL
      return f"https://{bucket_name}.s3.{region}.amazonaws.com/{s3_key}"
  ```

  ```javascript JavaScript theme={null}
  async function uploadImageToS3(imageUrl, s3Key) {
    // Download from Reducto
    const response = await fetch(imageUrl);
    if (!response.ok) throw new Error(`Failed to fetch: ${response.status}`);
    const imageBuffer = Buffer.from(await response.arrayBuffer());

    // Upload to S3
    await s3.send(new PutObjectCommand({
      Bucket: bucketName,
      Key: s3Key,
      Body: imageBuffer,
      ContentType: "image/png"
    }));

    // Return the permanent S3 URL
    return `https://${bucketName}.s3.${region}.amazonaws.com/${s3Key}`;
  }
  ```
</CodeGroup>

**Why this URL format?**

S3 URLs follow the pattern `https://{bucket}.s3.{region}.amazonaws.com/{key}`. Including the region is important, otherwise S3 may redirect requests, which can cause issues with some clients.

### Upload all images

Now we loop through all chunks. For each chunk, we check if it contains any figures or tables. If it does, we upload those images to S3 and store the URLs. If it doesn't, we still index the chunk but without an image URL.

This is the key difference from image-only indexing: **we index everything**, both text-only chunks and chunks with figures.

<CodeGroup>
  ```python Python theme={null}
  all_items = []

  for chunk_idx, chunk in enumerate(result.result.chunks):
      # Find any figures/tables in this chunk
      images_in_chunk = []
      for block in chunk.blocks:
          if block.type in ["Figure", "Table"] and block.image_url:
              image_id = f"chunk-{chunk_idx}-{block.type.lower()}-page{block.bbox.page}"
              s3_key = f"multimodal-rag/{image_id}.png"
              s3_url = upload_image_to_s3(block.image_url, s3_key)
              images_in_chunk.append({
                  "s3_url": s3_url,
                  "block_type": block.type,
                  "page": block.bbox.page
              })

      # Get the page number from the first block
      page = chunk.blocks[0].bbox.page if chunk.blocks else 1

      # Index this chunk (with or without images)
      item = {
          "id": f"chunk-{chunk_idx}",
          "text": chunk.embed,
          "page": page,
          "has_images": len(images_in_chunk) > 0
      }

      # If this chunk has images, include the first one
      # (for chunks with multiple figures, you could store all URLs)
      if images_in_chunk:
          item["image_url"] = images_in_chunk[0]["s3_url"]
          item["block_type"] = images_in_chunk[0]["block_type"]

      all_items.append(item)

  # Count what we have
  chunks_with_images = sum(1 for item in all_items if item.get("has_images"))
  chunks_text_only = len(all_items) - chunks_with_images

  print(f"Total chunks: {len(all_items)}")
  print(f"  - With images: {chunks_with_images}")
  print(f"  - Text only: {chunks_text_only}")
  ```

  ```javascript JavaScript theme={null}
  const allItems = [];

  for (let chunkIdx = 0; chunkIdx < result.result.chunks.length; chunkIdx++) {
    const chunk = result.result.chunks[chunkIdx];

    // Find any figures/tables in this chunk
    const imagesInChunk = [];
    for (const block of chunk.blocks) {
      if (["Figure", "Table"].includes(block.type) && block.image_url) {
        const imageId = `chunk-${chunkIdx}-${block.type.toLowerCase()}-page${block.bbox.page}`;
        const s3Key = `multimodal-rag/${imageId}.png`;
        const s3Url = await uploadImageToS3(block.image_url, s3Key);
        imagesInChunk.push({
          s3_url: s3Url,
          block_type: block.type,
          page: block.bbox.page
        });
      }
    }

    // Get the page number from the first block
    const page = chunk.blocks.length > 0 ? chunk.blocks[0].bbox.page : 1;

    // Index this chunk (with or without images)
    const item = {
      id: `chunk-${chunkIdx}`,
      text: chunk.embed,
      page: page,
      has_images: imagesInChunk.length > 0
    };

    // If this chunk has images, include the first one
    if (imagesInChunk.length > 0) {
      item.image_url = imagesInChunk[0].s3_url;
      item.block_type = imagesInChunk[0].block_type;
    }

    allItems.push(item);
  }

  // Count what we have
  const chunksWithImages = allItems.filter(item => item.has_images).length;
  const chunksTextOnly = allItems.length - chunksWithImages;

  console.log(`Total chunks: ${allItems.length}`);
  console.log(`  - With images: ${chunksWithImages}`);
  console.log(`  - Text only: ${chunksTextOnly}`);
  ```
</CodeGroup>

```
Total chunks: 39
  - With images: 33
  - Text only: 6
```

Now our index will contain the entire document. Text-only queries will find relevant text chunks, while queries about figures will find chunks that have associated images.

### Verify an image is accessible

Let's make sure our S3 URLs work:

<CodeGroup>
  ```python Python theme={null}
  # Find a chunk that has an image
  test_item = next(item for item in all_items if item.get("image_url"))
  test_url = test_item["image_url"]
  print(f"Testing: {test_url}")

  response = requests.head(test_url)
  print(f"Status: {response.status_code}")
  ```

  ```javascript JavaScript theme={null}
  // Find a chunk that has an image
  const testItem = allItems.find(item => item.image_url);
  const testUrl = testItem.image_url;
  console.log(`Testing: ${testUrl}`);

  const testResponse = await fetch(testUrl, { method: "HEAD" });
  console.log(`Status: ${testResponse.status}`);
  ```
</CodeGroup>

```
Testing: https://reducto-multimodal-rag-demo.s3.ap-south-1.amazonaws.com/multimodal-rag/chunk-0-figure-page1.png
Status: 200
```

<Warning>
  If you get a 403 Forbidden error, your bucket policy isn't set correctly. Go back to the S3 setup and make sure you've enabled public access and added the bucket policy.
</Warning>

***

## Step 4: Index into Pinecone

With images safely stored in S3, we can now create vector embeddings and store them in Pinecone.

### Initialize the clients

<CodeGroup>
  ```python Python theme={null}
  from pinecone import Pinecone
  import voyageai

  pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
  index = pc.Index("multimodal-rag")

  vo = voyageai.Client(api_key=os.environ["VOYAGEAI_API_KEY"])
  ```

  ```javascript JavaScript theme={null}
  import { Pinecone } from "@pinecone-database/pinecone";
  import { VoyageAIClient } from "voyageai";

  const pc = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
  const index = pc.index("multimodal-rag");

  const vo = new VoyageAIClient({ apiKey: process.env.VOYAGEAI_API_KEY });
  ```
</CodeGroup>

### Understanding what we're indexing

For each chunk, we store:

* **Vector**: Embedding of the text (from `chunk.embed`)
* **Metadata**: Text preview, page number, and optionally an S3 image URL

The vector enables semantic search across the entire document. When a chunk has an associated image, the metadata includes the S3 URL so we can retrieve it at query time.

### Create embeddings and upsert

<CodeGroup>
  ```python Python theme={null}
  for item in all_items:
      # Create embedding from the chunk's embed text
      embedding_response = vo.embed(
          [item["text"][:8000]],  # VoyageAI has input limits
          model="voyage-3"
      )
      embedding = embedding_response.embeddings[0]

      # Build metadata (image_url only included if chunk has images)
      metadata = {
          "text": item["text"][:1000],  # Preview for display
          "page": item["page"],
          "has_images": item["has_images"]
      }

      if item.get("image_url"):
          metadata["image_url"] = item["image_url"]
          metadata["block_type"] = item.get("block_type", "Figure")

      # Upsert to Pinecone
      index.upsert(vectors=[{
          "id": item["id"],
          "values": embedding,
          "metadata": metadata
      }])

  print(f"Indexed {len(all_items)} items")
  ```

  ```javascript JavaScript theme={null}
  for (const item of allItems) {
    // Create embedding from the chunk's embed text
    const embeddingResponse = await vo.embed({
      input: [item.text.slice(0, 8000)],  // VoyageAI has input limits
      model: "voyage-3"
    });
    const embedding = embeddingResponse.data[0].embedding;

    // Build metadata (image_url only included if chunk has images)
    const metadata = {
      text: item.text.slice(0, 1000),  // Preview for display
      page: item.page,
      has_images: item.has_images
    };

    if (item.image_url) {
      metadata.image_url = item.image_url;
      metadata.block_type = item.block_type || "Figure";
    }

    // Upsert to Pinecone
    await index.upsert([{
      id: item.id,
      values: embedding,
      metadata: metadata
    }]);
  }

  console.log(`Indexed ${allItems.length} items`);
  ```
</CodeGroup>

```
Indexed 39 items
```

**Why do we truncate the text?**

* For embeddings (`[:8000]`): VoyageAI has input token limits
* For metadata (`[:1000]`): Pinecone has metadata size limits (\~40KB per vector)

We store enough text in metadata to display a preview, but the full context is captured in the embedding.

<Note>
  VoyageAI's free tier has rate limits (3 requests per minute). For production use with many documents, add rate limiting or upgrade your plan.
</Note>

***

## Step 5: Query and retrieve

Now we can search our index. When someone asks a question, we:

1. Embed their query using the same model (voyage-3)
2. Find the most similar vectors in Pinecone
3. Return the matches with their S3 image URLs

### Create the search function

<CodeGroup>
  ```python Python theme={null}
  def search(query: str, top_k: int = 3):
      """Search for relevant chunks with images."""
      # Embed the query
      query_embedding = vo.embed(
          [query],
          model="voyage-3"
      ).embeddings[0]

      # Search Pinecone
      results = index.query(
          vector=query_embedding,
          top_k=top_k,
          include_metadata=True
      )

      return results.matches
  ```

  ```javascript JavaScript theme={null}
  async function search(query, topK = 3) {
    // Embed the query
    const embeddingResponse = await vo.embed({
      input: [query],
      model: "voyage-3"
    });
    const queryEmbedding = embeddingResponse.data[0].embedding;

    // Search Pinecone
    const results = await index.query({
      vector: queryEmbedding,
      topK: topK,
      includeMetadata: true
    });

    return results.matches;
  }
  ```
</CodeGroup>

### Test the search

<CodeGroup>
  ```python Python theme={null}
  matches = search("cell viability and treatment effects")

  for match in matches:
      print(f"Score: {match.score:.3f}")
      print(f"Page: {match.metadata['page']}")
      print(f"Has image: {match.metadata.get('has_images', False)}")
      if match.metadata.get("image_url"):
          print(f"Image: {match.metadata['image_url']}")
      print(f"Text preview: {match.metadata['text'][:100]}...")
      print("---")
  ```

  ```javascript JavaScript theme={null}
  const matches = await search("cell viability and treatment effects");

  for (const match of matches) {
    console.log(`Score: ${match.score.toFixed(3)}`);
    console.log(`Page: ${match.metadata.page}`);
    console.log(`Has image: ${match.metadata.has_images || false}`);
    if (match.metadata.image_url) {
      console.log(`Image: ${match.metadata.image_url}`);
    }
    console.log(`Text preview: ${match.metadata.text.slice(0, 100)}...`);
    console.log("---");
  }
  ```
</CodeGroup>

```
Score: 0.354
Page: 1
Has image: True
Image: https://reducto-multimodal-rag-demo.s3.ap-south-1.amazonaws.com/multimodal-rag/chunk-8-figure-page1.png
Text preview: # Inhibition of hepatitis B virus via selective apoptosis modulation by Chinese patent medicine Liuwe...
---
Score: 0.227
Page: 1
Has image: True
Image: https://reducto-multimodal-rag-demo.s3.ap-south-1.amazonaws.com/multimodal-rag/chunk-3-figure-page1.png
Text preview: # Inhibition of hepatitis B virus via selective apoptosis modulation by Chinese patent medicine Liuwe...
---
Score: 0.133
Page: 1
Has image: False
Text preview: The study investigated the effects of LWWL on hepatocyte apoptosis using flow cytometry analysis...
---
```

The search returns a mix of results. Some chunks have associated images, others are text-only. Both contribute to answering the question.

***

## Step 6: Send to your LLM

You now have everything needed for multimodal generation:

* **Text context**: `match.metadata["text"]`
* **Image URLs**: `match.metadata["image_url"]` (for chunks with images)

Pass these to any vision-capable LLM (Claude, GPT-4V, Gemini). Most vision APIs accept either image URLs directly or base64-encoded images.

***

## Reducto features for better results

These Reducto settings can improve your multimodal RAG pipeline:

<AccordionGroup>
  <Accordion title="Figure summaries (enabled by default)">
    Reducto automatically generates AI descriptions of figures and includes them in the `embed` field. This is why queries like "show me the revenue chart" can find relevant figures even if "revenue" doesn't appear in surrounding text.

    This is controlled by `summarize_figures` (default: `True`). Keep it enabled for multimodal RAG.
  </Accordion>

  <Accordion title="Agentic figure extraction">
    For documents with complex charts, enable agentic figure extraction for higher accuracy:

    <CodeGroup>
      ```python Python theme={null}
      result = client.parse.run(
          input=upload.file_id,
          settings={"return_images": ["figure", "table"]},
          enhance={
              "agentic": [{"scope": "figure"}]
          }
      )
      ```

      ```javascript JavaScript theme={null}
      const result = await client.parse.run({
        input: upload.file_id,
        settings: { return_images: ["figure", "table"] },
        enhance: {
          agentic: [{ scope: "figure" }]
        }
      });
      ```
    </CodeGroup>

    This uses vision LLMs to better understand chart content. It adds latency and cost but improves extraction quality.
  </Accordion>

  <Accordion title="Embedding-optimized output">
    Enable `embedding_optimized` for output specifically tuned for vector embeddings:

    <CodeGroup>
      ```python Python theme={null}
      result = client.parse.run(
          input=upload.file_id,
          retrieval={
              "chunking": {"chunk_mode": "section"},
              "embedding_optimized": True
          }
      )
      ```

      ```javascript JavaScript theme={null}
      const result = await client.parse.run({
        input: upload.file_id,
        retrieval: {
          chunking: { chunk_mode: "section" },
          embedding_optimized: true
        }
      });
      ```
    </CodeGroup>
  </Accordion>

  <Accordion title="Filter block types">
    If you only want certain content types, use `filter_blocks` to exclude others from the embed field:

    <CodeGroup>
      ```python Python theme={null}
      result = client.parse.run(
          input=upload.file_id,
          retrieval={
              "filter_blocks": ["Header", "Footer", "Page Number"]
          }
      )
      ```

      ```javascript JavaScript theme={null}
      const result = await client.parse.run({
        input: upload.file_id,
        retrieval: {
          filter_blocks: ["Header", "Footer", "Page Number"]
        }
      });
      ```
    </CodeGroup>
  </Accordion>
</AccordionGroup>

***

## Best practices

<AccordionGroup>
  <Accordion title="Choose the right chunking strategy">
    The `chunk_mode` setting affects how figures relate to surrounding text:

    * **`section`**: Groups content by document sections. Best for structured documents like research papers.
    * **`page`**: One chunk per page. Simple and predictable.
    * **`variable`**: Adaptive chunking based on content density.

    For multimodal RAG, `section` usually works best because it keeps figures with their explanatory text.
  </Accordion>

  <Accordion title="Use the embed field for vector search">
    Always use `chunk.embed` (not `chunk.content`) for your vector embeddings. The embed field includes AI-generated figure descriptions that make visual content searchable.
  </Accordion>

  <Accordion title="Not every query needs images">
    Sending images to your LLM adds latency and cost. Consider routing:

    * Simple factual questions → Text-only RAG
    * Questions about trends, comparisons, or visuals → Multimodal RAG

    You can implement this by checking `has_images` in retrieved chunks before deciding which path to take.
  </Accordion>

  <Accordion title="Handle rate limits gracefully">
    VoyageAI's free tier has a 3 requests per minute limit. For production:

    * Batch multiple texts in a single embedding call
    * Add delays between calls
    * Upgrade to a paid plan for higher limits
  </Accordion>

  <Accordion title="Limit retrieved images">
    More images means higher LLM costs and slower responses. For most questions, 2-3 images is sufficient. Use `top_k=3` in your search function.
  </Accordion>
</AccordionGroup>

***

## Complete example

Here's the full pipeline in a single script:

<CodeGroup>
  ```python Python theme={null}
  import os
  from pathlib import Path
  import requests
  import boto3
  from pinecone import Pinecone
  import voyageai
  from reducto import Reducto

  # Initialize clients
  reducto = Reducto()
  s3 = boto3.client("s3")
  pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])
  voyage = voyageai.Client()

  bucket_name = os.environ["S3_BUCKET_NAME"]
  index_name = "multimodal-rag"

  # Get bucket region for S3 URLs
  region = s3.get_bucket_location(Bucket=bucket_name).get("LocationConstraint") or "us-east-1"

  def upload_image_to_s3(image_url, s3_key):
      """Download image from Reducto and upload to S3."""
      response = requests.get(image_url)
      response.raise_for_status()
      s3.put_object(Bucket=bucket_name, Key=s3_key, Body=response.content, ContentType="image/png")
      return f"https://{bucket_name}.s3.{region}.amazonaws.com/{s3_key}"

  def index_document(file_path):
      """Parse document, upload images to S3, and index in Pinecone."""
      # Parse with image extraction
      upload = reducto.upload(file=Path(file_path))

      result = reducto.parse.run(
          input=upload.file_id,
          settings={"return_images": ["figure", "table"]},
          retrieval={"chunking": {"chunk_mode": "section"}}
      )

      # Process chunks and upload images
      records = []
      for i, chunk in enumerate(result.result.chunks):
          image_url = None
          for block in chunk.blocks:
              if block.type in ["Figure", "Table"] and block.image_url:
                  s3_key = f"{result.job_id}/chunk_{i}.png"
                  image_url = upload_image_to_s3(block.image_url, s3_key)
                  break

          # Embed and prepare record
          embedding = voyage.embed([chunk.embed], model="voyage-3").embeddings[0]
          records.append({
              "id": f"{result.job_id}_{i}",
              "values": embedding,
              "metadata": {
                  "text": chunk.embed,
                  "page": chunk.blocks[0].bbox.page if chunk.blocks else 0,
                  "image_url": image_url or "",
                  "has_image": image_url is not None
              }
          })

      # Upsert to Pinecone
      index = pc.Index(index_name)
      index.upsert(vectors=records)
      return len(records)

  def search(query, top_k=3):
      """Search for relevant chunks."""
      embedding = voyage.embed([query], model="voyage-3").embeddings[0]
      index = pc.Index(index_name)
      results = index.query(vector=embedding, top_k=top_k, include_metadata=True)
      return results.matches

  def query_with_images(query):
      """Search and prepare context for LLM."""
      matches = search(query, top_k=3)
      context = "\n\n".join([m.metadata["text"] for m in matches])
      image_urls = [m.metadata["image_url"] for m in matches if m.metadata.get("image_url")]
      return {"context": context, "images": image_urls, "query": query}

  # Usage
  indexed = index_document("research-paper.pdf")
  print(f"Indexed {indexed} chunks")

  result = query_with_images("What do the experimental results show?")
  print(f"Context: {len(result['context'])} chars, Images: {len(result['images'])}")
  # Pass result to your vision LLM
  ```

  ```javascript JavaScript theme={null}
  import fs from "fs";
  import Reducto from "reductoai";
  import { S3Client, PutObjectCommand, GetBucketLocationCommand } from "@aws-sdk/client-s3";
  import { Pinecone } from "@pinecone-database/pinecone";
  import { VoyageAIClient } from "voyageai";

  // Initialize clients
  const reducto = new Reducto();
  const s3 = new S3Client({ region: "us-east-1" });
  const pc = new Pinecone({ apiKey: process.env.PINECONE_API_KEY });
  const voyage = new VoyageAIClient();

  const bucketName = process.env.S3_BUCKET_NAME;
  const indexName = "multimodal-rag";

  // Get bucket region for S3 URLs
  const locationResponse = await s3.send(new GetBucketLocationCommand({ Bucket: bucketName }));
  const region = locationResponse.LocationConstraint || "us-east-1";

  async function uploadImageToS3(imageUrl, s3Key) {
    const response = await fetch(imageUrl);
    if (!response.ok) throw new Error(`Failed to fetch: ${response.status}`);
    const imageBuffer = Buffer.from(await response.arrayBuffer());
    await s3.send(new PutObjectCommand({
      Bucket: bucketName, Key: s3Key, Body: imageBuffer, ContentType: "image/png"
    }));
    return `https://${bucketName}.s3.${region}.amazonaws.com/${s3Key}`;
  }

  async function indexDocument(filePath) {
    // Parse with image extraction
    const upload = await reducto.upload({ file: fs.createReadStream(filePath) });
    const result = await reducto.parse.run({
      input: upload.file_id,
      settings: { return_images: ["figure", "table"] },
      retrieval: { chunking: { chunk_mode: "section" } }
    });

    // Process chunks and upload images
    const records = [];
    for (let i = 0; i < result.result.chunks.length; i++) {
      const chunk = result.result.chunks[i];
      let imageUrl = null;
      for (const block of chunk.blocks) {
        if (["Figure", "Table"].includes(block.type) && block.image_url) {
          const s3Key = `${result.job_id}/chunk_${i}.png`;
          imageUrl = await uploadImageToS3(block.image_url, s3Key);
          break;
        }
      }

      // Embed and prepare record
      const embeddingResponse = await voyage.embed({ input: [chunk.embed], model: "voyage-3" });
      records.push({
        id: `${result.job_id}_${i}`,
        values: embeddingResponse.data[0].embedding,
        metadata: {
          text: chunk.embed,
          page: chunk.blocks.length > 0 ? chunk.blocks[0].bbox.page : 0,
          image_url: imageUrl || "",
          has_image: imageUrl !== null
        }
      });
    }

    // Upsert to Pinecone
    const index = pc.index(indexName);
    await index.upsert(records);
    return records.length;
  }

  async function search(query, topK = 3) {
    const embeddingResponse = await voyage.embed({ input: [query], model: "voyage-3" });
    const index = pc.index(indexName);
    const results = await index.query({
      vector: embeddingResponse.data[0].embedding, topK, includeMetadata: true
    });
    return results.matches;
  }

  async function queryWithImages(query) {
    const matches = await search(query, 3);
    const context = matches.map(m => m.metadata.text).join("\n\n");
    const imageUrls = matches.filter(m => m.metadata.image_url).map(m => m.metadata.image_url);
    return { context, images: imageUrls, query };
  }

  // Usage
  const indexed = await indexDocument("research-paper.pdf");
  console.log(`Indexed ${indexed} chunks`);

  const result = await queryWithImages("What do the experimental results show?");
  console.log(`Context: ${result.context.length} chars, Images: ${result.images.length}`);
  // Pass result to your vision LLM
  ```
</CodeGroup>

***

## Next steps

<CardGroup cols={2}>
  <Card title="Batch Processing" icon="layer-group" href="/cookbooks/batch-processing">
    Process multiple documents in parallel.
  </Card>

  <Card title="Chunking Methods" icon="puzzle-piece" href="/configs/parse/chunking-methods">
    Learn about different chunking strategies.
  </Card>

  <Card title="Chart Extraction" icon="chart-bar" href="/configs/parse/chart-extraction">
    Configure image extraction and other parse options.
  </Card>

  <Card title="Response Format" icon="code" href="/parse/response-format">
    Understand the full parse response structure.
  </Card>
</CardGroup>
