> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Parse Best Practices

> Key practices for getting the best results from Parse

## 1. Use Variable Chunking for RAG

The default chunking mode (`disabled`) returns the entire document as one chunk. For RAG applications you need smaller chunks that can be embedded and retrieved independently.

Variable chunking splits at semantic boundaries like section headers, tables, and figures, keeping related content together while creating chunks sized for embedding models.

<CodeGroup>
  ```python Python theme={null}
  result = client.parse.run(
      input=upload.file_id,
      retrieval={
          "chunking": {"chunk_mode": "variable"},
          "embedding_optimized": True
      }
  )

  # Use embed field for vector database, content for display
  for chunk in result.result.chunks:
      vector_db.insert(
          embedding=embed(chunk.embed),
          metadata={"content": chunk.content}
      )
  ```

  ```javascript Node.js theme={null}
  const result = await client.parse.run({
    input: upload.file_id,
    retrieval: {
      chunking: { chunk_mode: 'variable' },
      embedding_optimized: true
    }
  });

  // Use embed field for vector database, content for display
  for (const chunk of result.result.chunks) {
    await vectorDb.insert({
      embedding: embed(chunk.embed),
      metadata: { content: chunk.content }
    });
  }
  ```

  ```go Go theme={null}
  result, _ := client.Parse.Run(context.Background(), reducto.ParseRunParams{
      ParseConfig: reducto.ParseConfigParam{
          DocumentURL: reducto.F[reducto.ParseConfigDocumentURLUnionParam](
              shared.UnionString(upload.FileID),
          ),
      },
      Retrieval: reducto.F(reducto.RetrievalParam{
          Chunking: reducto.F(reducto.RetrievalChunkingUnionParam{
              OfVariableChunking: &reducto.VariableChunkingConfigParam{
                  ChunkMode: reducto.F(reducto.VariableChunkingConfigChunkModeVariable),
              },
          }),
          EmbeddingOptimized: reducto.F(true),
      }),
  })
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/parse \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "reducto://your-file-id",
      "retrieval": {
        "chunking": {"chunk_mode": "variable"},
        "embedding_optimized": true
      }
    }'
  ```
</CodeGroup>

The `embed` field contains table and figure summaries as natural language, which embeds better than raw Markdown tables. The `content` field preserves the original formatting for display.

***

## 2. Enable Agentic Mode Only When Needed

Agentic mode uses an LLM to review and correct parsing output. It adds latency with additional credit usage, so only enable it when needed.

**When to enable `scope: "text"`:**

* Handwritten documents or signatures
* Faded or low-quality scans
* Documents with unusual fonts
* When you see garbled characters in output

**When to enable `scope: "table"`:**

* Tables with misaligned columns after parsing
* Merged cells that didn't parse correctly
* Numbers appearing in wrong columns
* Financial documents where accuracy is critical

**When to enable `scope: "figure"`:**

* Charts and graphs that need data extraction
* [Advanced chart extraction](https://reducto.ai/blog/reducto-chart-extraction) with structured data output
* Diagrams requiring detailed descriptions
* Visual elements where you need numeric data from bar charts, line graphs, or pie charts

<CodeGroup>
  ```python Python theme={null}
  result = client.parse.run(
      input=upload.file_id,
      enhance={
          "agentic": [
              {"scope": "text"},
              {"scope": "table"},
              {"scope": "figure", "advanced_chart_agent": True}
          ]
      }
  )
  ```

  ```javascript Node.js theme={null}
  const result = await client.parse.run({
    input: upload.file_id,
    enhance: {
      agentic: [
        { scope: 'text' },
        { scope: 'table' },
        { scope: 'figure', advanced_chart_agent: true }
      ]
    }
  });
  ```

  ```go Go theme={null}
  result, _ := client.Parse.Run(context.Background(), reducto.ParseRunParams{
      ParseConfig: reducto.ParseConfigParam{
          DocumentURL: reducto.F[reducto.ParseConfigDocumentURLUnionParam](
              shared.UnionString(upload.FileID),
          ),
      },
      Enhance: reducto.F(reducto.EnhanceParam{
          Agentic: reducto.F([]reducto.AgenticScopeParam{
              {Scope: reducto.F(reducto.AgenticScopeScopeText)},
              {Scope: reducto.F(reducto.AgenticScopeScopeTable)},
              {Scope: reducto.F(reducto.AgenticScopeScopeFigure), AdvancedChartAgent: reducto.F(true)},
          }),
      }),
  })
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/parse \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "reducto://your-file-id",
      "enhance": {
        "agentic": [
          {"scope": "text"},
          {"scope": "table"},
          {"scope": "figure", "advanced_chart_agent": true}
        ]
      }
    }'
  ```
</CodeGroup>

Clean digital PDFs (native text, not scanned) parse correctly without agentic mode. Test your document types without it first, then enable selectively.

***

## 3. Set Priority for Async Requests

Parse has sync (`/parse`) and async (`/parse_async`) endpoints.

**Async requests without `priority: true` enter a queue** and may experience delays during high traffic. If you're using async for latency-sensitive requests (user-facing features, real-time processing), always set priority.

<CodeGroup>
  ```python Python theme={null}
  job = client.parse.run_job(
      input=upload.file_id,
      async_config={"priority": True}
  )
  ```

  ```javascript Node.js theme={null}
  const job = await client.parse.runJob({
    input: upload.file_id,
    asyncConfig: { priority: true }
  });
  ```

  ```go Go theme={null}
  job, _ := client.Parse.RunJob(context.Background(), reducto.ParseRunJobParams{
      ParseConfig: reducto.ParseConfigParam{
          DocumentURL: reducto.F[reducto.ParseConfigDocumentURLUnionParam](
              shared.UnionString(upload.FileID),
          ),
      },
      Options: reducto.F(reducto.ParseRunJobParamsOptionsParam{
          Priority: reducto.F(true),
      }),
  })
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/parse_async \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "reducto://your-file-id",
      "options": {"priority": true}
    }'
  ```
</CodeGroup>

Use async with priority or sync for documents that need speed. Use async without priority for batch processing where latency doesn't matter.

***

## 4. Use HTML for Complex Tables

The default table format (`dynamic`) auto-selects HTML or Markdown based on complexity. For documents with complex tables (merged cells, nested headers, multi-row cells), explicitly request HTML.

<CodeGroup>
  ```python Python theme={null}
  result = client.parse.run(
      input=upload.file_id,
      formatting={
          "table_output_format": "html"
      }
  )
  ```

  ```javascript Node.js theme={null}
  const result = await client.parse.run({
    input: upload.file_id,
    formatting: {
      table_output_format: 'html'
    }
  });
  ```

  ```go Go theme={null}
  result, _ := client.Parse.Run(context.Background(), reducto.ParseRunParams{
      ParseConfig: reducto.ParseConfigParam{
          DocumentURL: reducto.F[reducto.ParseConfigDocumentURLUnionParam](
              shared.UnionString(upload.FileID),
          ),
      },
      Formatting: reducto.F(reducto.FormattingParam{
          TableOutputFormat: reducto.F(reducto.FormattingTableOutputFormatHTML),
      }),
  })
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/parse \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "reducto://your-file-id",
      "formatting": {
        "table_output_format": "html"
      }
    }'
  ```
</CodeGroup>

Markdown tables can't represent merged cells or complex structures. If your tables look broken, switching to HTML usually fixes it. For programmatic access to cell data, use `json` format instead.

***

## 5. Filter Headers and Footers for RAG

Page headers, footers, and page numbers add noise to RAG retrieval. When a user asks about invoice totals, you don't want to retrieve chunks containing "Page 1 of 5" or "Confidential - Do Not Distribute".

<CodeGroup>
  ```python Python theme={null}
  result = client.parse.run(
      input=upload.file_id,
      retrieval={
          "filter_blocks": ["Header", "Footer", "Page Number"]
      }
  )
  ```

  ```javascript Node.js theme={null}
  const result = await client.parse.run({
    input: upload.file_id,
    retrieval: {
      filter_blocks: ['Header', 'Footer', 'Page Number']
    }
  });
  ```

  ```go Go theme={null}
  result, _ := client.Parse.Run(context.Background(), reducto.ParseRunParams{
      ParseConfig: reducto.ParseConfigParam{
          DocumentURL: reducto.F[reducto.ParseConfigDocumentURLUnionParam](
              shared.UnionString(upload.FileID),
          ),
      },
      Retrieval: reducto.F(reducto.RetrievalParam{
          FilterBlocks: reducto.F([]reducto.RetrievalFilterBlock{
              reducto.RetrievalFilterBlockHeader,
              reducto.RetrievalFilterBlockFooter,
              reducto.RetrievalFilterBlockPageNumber,
          }),
      }),
  })
  ```

  ```bash cURL theme={null}
  curl -X POST https://platform.reducto.ai/parse \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "reducto://your-file-id",
      "retrieval": {
        "filter_blocks": ["Header", "Footer", "Page Number"]
      }
    }'
  ```
</CodeGroup>

The filtered blocks still appear in `chunks[].blocks` metadata (so you can access them if needed), but they're excluded from `content` and `embed` fields.

***

## Common Pitfalls

<AccordionGroup>
  <Accordion title="Requests taking too long">
    If you're using agentic mode, it adds latency since it runs an LLM pass over the output. Disable it and only enable for document types that actually need correction. For async calls, make sure you have `priority: true` set.
  </Accordion>

  <Accordion title="RAG retrieval quality is poor">
    Check your chunking. If you're using `disabled` (default), the entire document is one chunk. Switch to `variable` for semantic chunking.
  </Accordion>

  <Accordion title="Tables render incorrectly">
    Switch from `dynamic` to `html` format. Markdown can't handle merged cells.
  </Accordion>

  <Accordion title="Async jobs are slow even for small docs">
    You forgot to set `priority: true`. Without it, jobs enter a queue.
  </Accordion>

  <Accordion title="Document fails with 'corrupted' or 'invalid' error">
    The PDF file may be malformed or use unsupported encryption. Try opening the file in a PDF viewer to verify it's valid. If the file opens but Reducto fails, the PDF may use non-standard formatting. Re-save it using a tool like Adobe Acrobat or a PDF printer, then retry.
  </Accordion>
</AccordionGroup>

***

## Configuration Reference

For complete details on all options mentioned above, see the dedicated configuration pages:

<CardGroup cols={2}>
  <Card title="Chunking Methods" icon="puzzle-piece" href="/configs/parse/chunking-methods">
    All chunking modes and their use cases.
  </Card>

  <Card title="Agentic Mode" icon="wand-magic-sparkles" href="/configs/parse/agentic-modes">
    When and how to use LLM-assisted parsing.
  </Card>

  <Card title="Table Formats" icon="table" href="/configs/parse/table-output-formats">
    HTML, Markdown, JSON, CSV options.
  </Card>

  <Card title="Configuration Overview" icon="gear" href="/configs/overview">
    Full reference of all configuration options.
  </Card>
</CardGroup>

***

## Related

<CardGroup cols={2}>
  <Card title="Parse Overview" icon="file-lines" href="/parse/overview">
    Quick start and basic usage.
  </Card>

  <Card title="Response Format" icon="brackets-curly" href="/parse/response-format">
    Understanding chunks, blocks, and bounding boxes.
  </Card>
</CardGroup>
