Skip to main content
Reducto enforces two independent limit mechanisms. This page covers the concurrency throttle. For the per-second request rate caps that return 429 at the edge, see Rate Limits.
MechanismWhat it limitsBehavior on exceededReturns
Rate limitsRequests per second to the APIRequest is rejected at the ingress429
Concurrency throttleParse batches running in parallel for your accountWork queues until a slot frees, then runs200 (after wait)
If you submit more parse work than your account’s concurrency ceiling allows, Reducto queues the excess rather than rejecting it. You see added latency, not 4xx.

Your Ceiling

ceiling = earned_base + burst_headroom
  • earned_base: capacity sized for your sustained recent traffic, starting from your tier baseline.
  • burst_headroom: short-term slack on top of earned_base so a sudden spike does not immediately queue.
The unit is concurrent batches. Reducto splits a parse job into one or more batches, typically around 10 pages each. A 5-page document runs as a single batch. A 200-page document runs as roughly 20 concurrent batches.

Tier Baselines

The baseline is the starting allocation for earned_base in the region you’re hitting. With little or no recent traffic, your ceiling sits around this baseline; sustained traffic grows it above. Baselines vary per region because shared compute capacity is sized for the typical regional load.
TierUSEUAU
Standard2006010
Growth35012020
Enterprise500+ (custom)275+ (custom)115+ (custom)
All values are in concurrent batches. The actual raw cap at any moment is higher than the baseline (burst headroom on top) and scales further with sustained traffic. Enterprise baselines are negotiable upward; contact sales to discuss. Multi-region customers get the tier’s baseline in each region they hit.

How Earned Capacity Grows

Submit consistent traffic and your ceiling grows above the baseline. Reducto measures your submission rate over a short trailing window and sizes your ceiling for that rate plus burst headroom. When you stop submitting, the ceiling decays back toward the baseline over the same window. Bursty traffic gets less headroom than steady traffic at the same average rate.

Sync vs Async Under Throttle

Requests above your ceiling do not fail. They queue until a slot frees, then run. The wait surfaces differently depending on endpoint type:
  • Async (/parse_async, /extract_async, /split_async, /edit_async). The job is accepted immediately, a job_id returned, and the work queued. Latency shows up between submission and webhook delivery, never as a 4xx.
  • Sync (/parse, /extract, /split, /edit). The HTTP request blocks until a slot opens and the job completes. Total response time includes queue wait. The edge has a 15-minute (900s) hard timeout, so a sustained burst on sync endpoints risks the HTTP connection timing out before the job finishes.
For bursty workloads, use async with webhooks. Your client doesn’t hold open HTTP connections during queue wait.
import asyncio
from reducto import AsyncReducto

async def submit_burst(files: list[str]):
    client = AsyncReducto()
    jobs = await asyncio.gather(*[
        client.parse.run_job(input=f, async_={"webhook": {"mode": "svix"}})
        for f in files
    ])
    return [job.job_id for job in jobs]

Rate Limits

Per-second request caps at the API edge.

Async Processing

Submit jobs and receive results via webhook.

Batch Processing

Patterns for processing many documents.