Option 1: Hybrid VPC deployment
A hybrid model balancing privacy and compute efficiency. All data and storage reside in the customer’s VPC, while ephemeral processing is handled by Reducto’s dedicated GPU infrastructure. The GPU infrastructure on Reducto’s side can be made single-tenant if desired or shared access can be provided.Architecture
S3 Bucket/Database lives in a customer environment. Ephemeral workers, which do not persist data beyond memory, are deployed solely for your tenant and connect with your DB via whitelisted outgoing IP.Benefits
- Maintains data sovereignty (storage & databases stay in the customer’s cloud with separate data and compute planes)
- Offloads model inference to Reducto’s GPU cluster for cost and latency efficiency
- Faster auto-scaling and avoid having to provision GPU capacity in your own cloud
- Faster iteration speeds for feature resolution, reduced devops burden
- Customers can connect their own LLMs or use Reducto’s built-in ZDR agreements with processors (OpenAI, Vertex, etc.)
Architecture variants
- Dynamic Workers - auto-scales compute based on demand (can scale to zero)
- Dedicated Workers - always-on for predictable throughput
Option 2: Reducto SaaS API
Reducto offers a SaaS option that eliminates operational overhead while providing enterprise-level security and compliance, making it ideal for teams who want to focus on building applications rather than managing infrastructure.Architecture
- Built on Amazon Web Services (AWS) and Modal Labs as the primary cloud providers
- Uses AWS S3 for secure data storage with encryption at rest and in transit
Benefits
- Zero infrastructure management
- HIPAA compliance available for Growth and Enterprise tiers
- Zero data retention for Growth and above tiers
- Automatic updates & reliability
- SOC 2 Type II compliant with comprehensive security audits
Option 3: Full VPC deployment
A dedicated deployment fully hosted within the customer’s cloud environment (AWS, GCP, or Azure), ensuring complete data isolation and compliance control.Architecture
Reducto provides a container image containing all proprietary models.- Deployment is orchestrated via Helm chart and Terraform templates for seamless setup
- Runs on Kubernetes with PostgreSQL as the minimum database requirement
- GPU usage is configurable depending on workload
- Reducto’s models are optimized to run on CPU where possible
Benefits
Data never leaves the customer’s VPC.Integration & options
- Customers can connect their own LLMs (OpenAI, Google Vertex AI, Bedrock, etc.)
- Customers can use Reducto’s post-trained LLMs requiring GPU clusters