Batch Processing

Reducto provides async SDKs that can help you process a huge number of documents, autoscaling to handle even the largest jobs with ease. This pattern works across all endpoints: /parse, /extract, and /split.

Make sure you have a Reducto SDK set up, see how in Quickstart.

If you’d rather queue long documents and be notified when they’re done (via polling or webhooks), check out run_job() for a fire-and-forget model. The run_job method also has no limits on how many requests you can send concurrently.

from reducto import AsyncReducto
from pathlib import Path
import asyncio
from tqdm.asyncio import tqdm

client = AsyncReducto()

MAX_CONCURRENCY = 1000
FILES_TO_PARSE = list(Path("docs").glob("*.pdf"))


async def main():
    sem = asyncio.Semaphore(MAX_CONCURRENCY)

    async def parse_document(path: Path):
        async with sem:
            upload = await client.upload(file=path)
            result = await client.parse.run(document_url=upload)
            output_path = path.with_suffix(".reducto.json")
            output_path.write_text(result.model_dump_json())

    await tqdm.gather(
        *[parse_document(path) for path in FILES_TO_PARSE], desc="Parsing documents"
    )


if __name__ == "__main__":
    asyncio.run(main())

Async Invocation Pattern Separating Webhook Channels

Get Started

Core Functions

Configurations

FAQ

Security and Privacy

On-Premise