> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Async Processing

> Process documents asynchronously with job queues, polling, and webhooks

Reducto offers two processing modes: synchronous and asynchronous. The SDK provides `run()` and `run_job()` methods that map to different API endpoints:

| SDK Method               | API Endpoint        | Returns                             |
| ------------------------ | ------------------- | ----------------------------------- |
| `client.parse.run()`     | `POST /parse`       | Full result (blocks until complete) |
| `client.parse.run_job()` | `POST /parse_async` | Job ID (returns immediately)        |

The same pattern applies to all endpoints: `/extract` vs `/extract_async`, `/split` vs `/split_async`, and `/pipeline` vs `/pipeline_async`.

<Note>
  The Go SDK is currently in alpha and has limited async support. Go users should use the REST API directly for async operations. See the cURL examples below.
</Note>

## run() vs run\_job()

| Method      | Behavior                                   | Best for                                            |
| ----------- | ------------------------------------------ | --------------------------------------------------- |
| `run()`     | Calls sync endpoint, blocks until complete | Interactive applications, smaller documents         |
| `run_job()` | Calls async endpoint, returns job ID       | Large documents, high volume, background processing |

Both methods produce the same results. The difference is whether you wait synchronously or retrieve results later.

### Synchronous: run()

<CodeGroup>
  ```python Python theme={null}
  from reducto import Reducto

  client = Reducto()

  # Blocks until parsing completes (may take seconds to minutes)
  result = client.parse.run(input="https://example.com/document.pdf")
  print(result.result.chunks)
  ```

  ```typescript TypeScript theme={null}
  import Reducto from "reductoai";

  const client = new Reducto();

  // Blocks until parsing completes
  const result = await client.parse.run({ input: "https://example.com/document.pdf" });
  console.log(result.result.chunks);
  ```

  ```go Go theme={null}
  package main

  import (
      "context"
      "fmt"
      "os"

      reducto "github.com/reductoai/reducto-go-sdk"
      "github.com/reductoai/reducto-go-sdk/option"
      "github.com/reductoai/reducto-go-sdk/shared"
  )

  func main() {
      client := reducto.NewClient(option.WithAPIKey(os.Getenv("REDUCTO_API_KEY")))

      // Blocks until parsing completes
      result, err := client.Parse.Run(context.Background(), reducto.ParseRunParams{
          ParseConfig: reducto.ParseConfigParam{
              DocumentURL: reducto.F[reducto.ParseConfigDocumentURLUnionParam](
                  shared.UnionString("https://example.com/document.pdf"),
              ),
          },
      })
      if err != nil {
          panic(err)
      }

      fmt.Println(result.Result.Chunks)
  }
  ```

  ```bash cURL theme={null}
  curl -X POST "https://platform.reducto.ai/parse" \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"input": "https://example.com/document.pdf"}'
  ```
</CodeGroup>

The `run()` method handles the job lifecycle internally. If the document takes too long, the request may time out. For documents over 50 pages or complex processing, consider using `run_job()` instead.

### Asynchronous: run\_job()

<CodeGroup>
  ```python Python theme={null}
  from reducto import Reducto

  client = Reducto()

  # Returns immediately with job ID
  submission = client.parse.run_job(input="https://example.com/document.pdf")
  print(f"Job submitted: {submission.job_id}")

  # Retrieve results later via polling or webhook
  ```

  ```typescript TypeScript theme={null}
  import Reducto from "reductoai";

  const client = new Reducto();

  // Returns immediately with job ID
  const submission = await client.parse.runJob({ input: "https://example.com/document.pdf" });
  console.log(`Job submitted: ${submission.job_id}`);

  // Retrieve results later via polling or webhook
  ```

  ```bash cURL theme={null}
  # Call the async endpoint directly
  curl -X POST "https://platform.reducto.ai/parse_async" \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"input": "https://example.com/document.pdf"}'

  # Response: {"job_id": "abc123-def456"}
  ```
</CodeGroup>

The `run_job()` method has no limit on concurrent submissions. You can queue thousands of documents and process them in parallel without managing connections or timeouts.

## Job lifecycle

When you submit a job via the async endpoint, it moves through these states:

| Status       | Meaning                                      |
| ------------ | -------------------------------------------- |
| `Pending`    | Job is queued, waiting for a worker          |
| `InProgress` | A worker is actively processing the document |
| `Completing` | Processing finished, results being saved     |
| `Completed`  | Results are ready to retrieve                |
| `Failed`     | Processing failed (check error message)      |

Jobs typically spend most of their time in `Pending` (waiting for capacity) or `InProgress` (actual processing).

## Polling for results

The simplest way to get results from an async job is to poll the job status:

<CodeGroup>
  ```python Python theme={null}
  import time
  from reducto import Reducto

  client = Reducto()

  # Submit job
  submission = client.parse.run_job(input="https://example.com/document.pdf")

  # Poll until complete
  while True:
      job = client.job.get(submission.job_id)
      
      if job.status == "Completed":
          print("Success:", job.result)
          break
      elif job.status == "Failed":
          print("Failed:", job.error)
          break
      
      time.sleep(2)
  ```

  ```typescript TypeScript theme={null}
  import Reducto from "reductoai";

  const client = new Reducto();

  // Submit job
  const submission = await client.parse.runJob({ input: "https://example.com/document.pdf" });

  // Poll until complete
  while (true) {
    const job = await client.job.retrieve(submission.job_id);
    
    if (job.status === "Completed") {
      console.log("Success:", job.result);
      break;
    } else if (job.status === "Failed") {
      console.log("Failed:", job.error);
      break;
    }
    
    await new Promise(resolve => setTimeout(resolve, 2000));
  }
  ```

  ```bash cURL theme={null}
  # Submit job
  JOB_ID=$(curl -s -X POST "https://platform.reducto.ai/parse_async" \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"input": "https://example.com/document.pdf"}' | jq -r '.job_id')

  echo "Job submitted: $JOB_ID"

  # Poll until complete
  while true; do
    RESPONSE=$(curl -s "https://platform.reducto.ai/job/$JOB_ID" \
      -H "Authorization: Bearer $REDUCTO_API_KEY")
    
    STATUS=$(echo $RESPONSE | jq -r '.status')
    
    if [ "$STATUS" = "Completed" ]; then
      echo "Success"
      echo $RESPONSE | jq '.result'
      break
    elif [ "$STATUS" = "Failed" ]; then
      echo "Failed"
      break
    fi
    
    sleep 2
  done
  ```
</CodeGroup>

Polling is straightforward but requires keeping a process running. For production systems processing many documents, webhooks are more efficient.

## Priority processing

By default, synchronous (`run()`) jobs are prioritized over asynchronous (`run_job()`) jobs. This ensures interactive requests get fast responses while background jobs process when capacity is available.

You can request priority processing for async jobs if your account has priority budget available:

<CodeGroup>
  ```python Python theme={null}
  submission = client.parse.run_job(
      input="urgent-document.pdf",
      async_={
          "priority": True
      }
  )
  ```

  ```typescript TypeScript theme={null}
  const submission = await client.parse.runJob({
    input: "https://example.com/urgent-document.pdf",
    async: {
      priority: true
    }
  });
  ```

  ```bash cURL theme={null}
  curl -X POST "https://platform.reducto.ai/parse_async" \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "https://example.com/urgent-document.pdf",
      "async": {"priority": true}
    }'
  ```
</CodeGroup>

Priority jobs are processed before non-priority async jobs but may still queue behind synchronous requests.

## Async endpoints

Every Reducto endpoint has a corresponding async variant:

| Sync Endpoint    | Async Endpoint         | SDK Method                  |
| ---------------- | ---------------------- | --------------------------- |
| `POST /parse`    | `POST /parse_async`    | `client.parse.run_job()`    |
| `POST /extract`  | `POST /extract_async`  | `client.extract.run_job()`  |
| `POST /split`    | `POST /split_async`    | `client.split.run_job()`    |
| `POST /pipeline` | `POST /pipeline_async` | `client.pipeline.run_job()` |

<CodeGroup>
  ```python Python theme={null}
  # Parse
  parse_job = client.parse.run_job(input="https://example.com/document.pdf")

  # Extract
  extract_job = client.extract.run_job(
      input="https://example.com/document.pdf",
      instructions={"schema": your_schema}
  )

  # Split
  split_job = client.split.run_job(
      input="https://example.com/document.pdf",
      split_description=[{"name": "Section A", "description": "..."}]
  )

  # Pipeline
  pipeline_job = client.pipeline.run_job(
      input="https://example.com/document.pdf",
      pipeline_id="your_pipeline_id"
  )
  ```

  ```typescript TypeScript theme={null}
  // Parse
  const parseJob = await client.parse.runJob({ input: "https://example.com/document.pdf" });

  // Extract
  const extractJob = await client.extract.runJob({
    input: "https://example.com/document.pdf",
    instructions: { schema: yourSchema }
  });

  // Split
  const splitJob = await client.split.runJob({
    input: "https://example.com/document.pdf",
    splitDescription: [{ name: "Section A", description: "..." }]
  });

  // Pipeline
  const pipelineJob = await client.pipeline.runJob({
    input: "document.pdf",
    pipelineId: "your_pipeline_id"
  });
  ```

  ```bash cURL theme={null}
  # Parse async
  curl -X POST "https://platform.reducto.ai/parse_async" \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{"input": "https://example.com/document.pdf"}'

  # Extract async
  curl -X POST "https://platform.reducto.ai/extract_async" \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "https://example.com/document.pdf",
      "instructions": {"schema": {"field": "string"}}
    }'

  # Split async
  curl -X POST "https://platform.reducto.ai/split_async" \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "https://example.com/document.pdf",
      "split_description": [{"name": "Section A", "description": "..."}]
    }'

  # Pipeline async
  curl -X POST "https://platform.reducto.ai/pipeline_async" \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "https://example.com/document.pdf",
      "pipeline_id": "your_pipeline_id"
    }'
  ```
</CodeGroup>

## Using metadata

Include metadata with your job submission to help identify and route results:

<CodeGroup>
  ```python Python theme={null}
  submission = client.parse.run_job(
      input="https://example.com/document.pdf",
      async_={
          "metadata": {
              "user_id": "user_123",
              "document_type": "invoice",
              "batch_id": "batch_456"
          }
      }
  )
  ```

  ```typescript TypeScript theme={null}
  const submission = await client.parse.runJob({
    input: "https://example.com/document.pdf",
    async: {
      metadata: {
        userId: "user_123",
        documentType: "invoice",
        batchId: "batch_456"
      }
    }
  });
  ```

  ```bash cURL theme={null}
  curl -X POST "https://platform.reducto.ai/parse_async" \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "https://example.com/document.pdf",
      "async": {
        "metadata": {
          "user_id": "user_123",
          "document_type": "invoice",
          "batch_id": "batch_456"
        }
      }
    }'
  ```
</CodeGroup>

The metadata is included in webhook notifications, making it easy to match results back to your application context without maintaining a separate mapping.

## When to use async

**Use `run()` when:**

* Processing single documents interactively
* Document size is small (under 20 pages)
* You need results immediately in the same request
* Testing and development

**Use `run_job()` / async endpoints when:**

* Processing many documents in parallel
* Documents are large or complex
* You want fire-and-forget with webhook notification
* Building batch processing pipelines
* Processing in background workers

## Job Retention

<Warning>
  **Jobs are deleted after 12 hours.** This is part of Reducto's zero data retention (ZDR) policy. If you query a job ID from more than 12 hours ago, you'll receive a "Job not found" error.
</Warning>

**Default behavior:** Job results are retained for 12 hours. After this window, you'll need to reprocess the document.

**For longer retention:** Enable `persist_results` to keep results indefinitely:

<CodeGroup>
  ```python Python theme={null}
  result = client.parse.run(
      input="document.pdf",
      settings={"persist_results": True}
  )
  # This job's results will be stored indefinitely
  ```

  ```typescript TypeScript theme={null}
  const result = await client.parse.run({
    input: "document.pdf",
    settings: { persist_results: true }
  });
  // This job's results will be stored indefinitely
  ```

  ```bash cURL theme={null}
  curl -X POST "https://platform.reducto.ai/parse" \
    -H "Authorization: Bearer $REDUCTO_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
      "input": "https://example.com/document.pdf",
      "settings": {"persist_results": true}
    }'
  ```
</CodeGroup>

<Note>
  `persist_results` requires opting in to Reducto Studio. Contact support to enable this feature for your organization.
</Note>

**Best practice:** Always store results in your own database when you receive them via polling or webhook, rather than relying on Reducto's retention.

***

## API Reference

See the full API documentation for async endpoints:

* [Parse Async](/api-reference/async-parse) - `POST /parse_async`
* [Extract Async](/api-reference/extract-async) - `POST /extract_async`
* [Split Async](/api-reference/split-async) - `POST /split_async`
* [Pipeline Async](/api-reference/pipeline-async) - `POST /pipeline_async`
* [Get Job](/api-reference/get-jobs) - `GET /job/{job_id}`

***

## Related

<CardGroup cols={2}>
  <Card title="Svix Webhooks" href="/workflows/svix-webhooks">
    Get notified when jobs complete instead of polling.
  </Card>

  <Card title="Batch Processing" href="/workflows/batch-processing">
    Process many documents in parallel with run().
  </Card>

  <Card title="Chaining Endpoints" href="/workflows/chaining-endpoints">
    Reuse parsed documents across multiple calls.
  </Card>

  <Card title="Pipeline Basics" href="/workflows/pipeline-basics">
    Bundle workflows into a single API call.
  </Card>
</CardGroup>
