Skip to main content
Reducto enforces rate limits to protect infrastructure and ensure fair access for all users. These limits affect how you can call the API, not the total volume you can process.

Current Limits

Limit TypeValueDescription
Concurrent requests200Maximum simultaneous requests to sync endpoints (/parse, /extract, /split, /edit)
Requests per second500Maximum rate of new request submissions

What This Means

Concurrent requests (200): If you call the /parse endpoint (synchronous), you can have up to 200 parses running simultaneously. Each request occupies a slot until it completes and returns a response. Requests per second (500): Even for heavy users, 500 requests/second is sufficient because it only limits how fast you can submit new jobs, not how many can run in parallel.

Sync vs Async Behavior

The limits apply differently to synchronous and asynchronous endpoints:
Endpoint TypeConcurrency Limit Applies?Rate Limit Applies?
/parse (sync)✅ Yes, counts toward 200✅ Yes
/parse_async❌ No, returns immediately✅ Yes
Async endpoints (/parse_async, /extract_async, etc.) return immediately with a job_id. The actual processing happens in a queue, so you can submit a much larger number of jobs without hitting the concurrency limit.
# Sync: Each call blocks until complete (counts toward 200 concurrent)
result = client.parse.run(input=upload.file_id)

# Async: Returns immediately with job_id (no concurrency limit)
job = client.parse.run_job(input=upload.file_id)
# Later: retrieve results
result = client.job.get(job.job_id)

Handling Rate Limit Errors

When you exceed rate limits, the API returns a 429 Too Many Requests error. The SDKs automatically retry with exponential backoff. If you’re hitting limits frequently:
  1. Switch to async endpoints for batch processing
  2. Add delays between requests (even 100ms helps)
  3. Use separate API keys if you need isolated rate limits for different applications
  4. Contact support if you need higher limits for your use case

Scaling Strategies

For High Volume Processing

Use async endpoints with webhooks:
import asyncio
from reducto import AsyncReducto

async def process_batch(files: list[str]):
    client = AsyncReducto()
    
    # Submit all jobs (no concurrency limit)
    jobs = await asyncio.gather(*[
        client.parse.run_job(
            input=f,
            async_={"webhook": {"mode": "svix"}}
        )
        for f in files
    ])
    
    return [job.job_id for job in jobs]

# Results delivered via webhook when complete

For Interactive Applications

Use sync endpoints with priority:
# For user-facing requests, sync with priority ensures fast response
result = client.parse.run(
    input=upload.file_id,
    # Sync requests are already prioritized over async
)

Best Practices

  1. Batch with async: For processing many documents, use async endpoints. Submit all jobs upfront, then collect results via webhooks or polling.
  2. Don’t parallelize sync calls excessively: If you spawn 500 threads each making sync requests, you’ll hit the 200 concurrent limit. Use async instead.
  3. Implement backoff: If you get a 429, wait before retrying. The SDKs handle this automatically.
  4. Monitor usage: Check your request patterns in Reducto Studio to identify bottlenecks.