Skip to main content
This guide covers setting up Hybrid VPC with AWS S3 as your storage backend. Reducto assumes an IAM role in your AWS account to read and write documents.

Prerequisites

  • AWS account(s): Can use separate accounts for development, staging, and production
  • Terraform 1.2+: For infrastructure provisioning
  • Values from Reducto (provided during onboarding):
    • Principal ARNs for Reducto’s compute services
    • ExternalId for secure role assumption
    • Endpoint Service name and region (if using PrivateLink)
  • If using PrivateLink: Send Reducto every AWS account ID that will create a VPC endpoint, including separate dev, staging, production, or organizational accounts. Reducto must allow-list each account on the VPC Endpoint Service before endpoint creation succeeds.

Principal ARNs

Use the appropriate ARNs for your deployment region:
EnvironmentEKS Role ARNModal User ARN
Prod (US)arn:aws:iam::731106932034:role/reducto-prodarn:aws:iam::731106932034:user/modal-prod
Prod-EUarn:aws:iam::731106932034:role/reducto-prod-euarn:aws:iam::731106932034:user/modal-prod-eu
Prod-AUarn:aws:iam::731106932034:role/reducto-prod-auarn:aws:iam::731106932034:user/modal-prod-au

VPC Endpoint Service Configuration

If using PrivateLink, use the endpoint service closest to your region:
EnvironmentVPC Endpoint Service NameRegionDNS Name
Prod (US)com.amazonaws.vpce.us-west-2.vpce-svc-0929182c8ed77b7a8us-west-2hybrid.platform.reducto.ai
Prod-EUcom.amazonaws.vpce.eu-central-1.vpce-svc-0a231d441f3a482a0eu-central-1hybrid.eu.platform.reducto.ai
Prod-AUcom.amazonaws.vpce.ap-southeast-2.vpce-svc-0da3ceba709035c36ap-southeast-2hybrid.au.platform.reducto.ai
VPC endpoints support cross-region connections. You can create a VPC endpoint in your region that connects to any Reducto endpoint service above, regardless of your VPC’s region.
PrivateLink endpoint creation only works after Reducto has allow-listed the AWS account that creates the endpoint. If your organization uses separate AWS accounts for dev, staging, production, or separate business units, provide each account ID before setup.

Setup

1

Clone the infrastructure repository

git clone https://github.com/reductoai-collab/reducto-hybrid-infra.git
cd reducto-hybrid-infra
2

Create terraform.tfvars

name_prefix = "reducto"

# Use the appropriate Principal ARNs for your region
reducto_principal_arns = [
  "arn:aws:iam::731106932034:role/reducto-prod",
  "arn:aws:iam::731106932034:user/modal-prod"
]
reducto_external_id = "<external-id-from-reducto>"

# Optional: customize bucket name (auto-generated if not set)
# bucket_name = "my-company-reducto-data"

# Optional: customize object retention (default: 1 day)
# lifecycle_expiration_days = 1

tags = {
  Environment = "production"
  Project     = "reducto-hybrid"
}
3

Initialize and apply

terraform init
terraform plan
terraform apply
4

Share outputs with Reducto

terraform output integration_values
Example output:
{
  "bucket_name": "reducto-data-a1b2c3d4",
  "region": "us-east-1",
  "role_arn": "arn:aws:iam::987654321098:role/reducto-access",
  "access_mode": "assume_role",
  "privatelink_endpoint_id": null
}

Components provisioned

ComponentPurposeRequired
S3 BucketDocument and artifact storage with configurable lifecycleYes
IAM RoleCross-account access for Reducto with ExternalId protectionYes
VPC EndpointPrivateLink endpoint for private API accessOptional

Access Modes

For private-only API access without traversing the public internet:
1

Request PrivateLink enablement

Provide the following to your Reducto team:
  • AWS Account ID(s): Every account where you’ll create a VPC endpoint, including dev, staging, production, or separate organizational accounts
  • Region(s): Where you need PrivateLink connectivity
Reducto will add each account root as an allowed principal on the VPC Endpoint Service. Wait for Reducto’s confirmation before you create the endpoint.
2

Configure VPC endpoint

Add to your Terraform configuration:
enable_privatelink              = true
vpc_id                          = "vpc-0123456789abcdef0"
subnet_ids                      = ["subnet-abc123", "subnet-def456"]
reducto_endpoint_service_name   = "com.amazonaws.vpce.us-west-2.vpce-svc-0929182c8ed77b7a8"
reducto_endpoint_service_region = "us-west-2"
3

Configure your client

Use the region-specific DNS name matching your VPC endpoint:
from reducto import Reducto

client = Reducto(
    api_key="your-api-key",
    base_url="https://hybrid.platform.reducto.ai"
)
You must enable private DNS resolution in your VPC endpoint configuration. This is required for the DNS alias to resolve correctly within your VPC.

Validation Checklist

After terraform apply, verify your setup:
  • Terraform apply succeeded without errors
  • S3 bucket has lifecycle rule:
    aws s3api get-bucket-lifecycle-configuration --bucket your-bucket-name
    
  • S3 bucket blocks public access: All public access settings should be blocked
  • IAM role trust policy is correct: Verify Reducto principals and ExternalId condition
    aws iam get-role --role-name reducto-access --query 'Role.AssumeRolePolicyDocument'
    
  • If PrivateLink enabled: Reducto has confirmed that every endpoint-creating AWS account is allow-listed
  • If PrivateLink enabled: Endpoint status shows “available”
    aws ec2 describe-vpc-endpoints --vpc-endpoint-ids vpce-xxx
    
  • Smoke test: Run a small Reducto job and verify objects appear in the bucket

Troubleshooting

VPC endpoint service does not exist

Problem: AWS returns InvalidServiceName or says the VPC Endpoint Service does not exist, but the service name and region match the table above. Solution: Send Reducto the AWS account ID for the account creating the endpoint. Reducto will allow-list the account root on the endpoint service. After Reducto confirms the change, retry endpoint creation. Repeat this for each account that will create an endpoint.

Multi-Region Setup

Deploy separate infrastructure in each region with region-specific Principal ARNs:
# US East
cd environments/us-east-1
terraform apply -var="name_prefix=reducto-us"

# EU (Frankfurt)
cd ../eu-central-1
terraform apply -var="name_prefix=reducto-eu"

# Asia Pacific (Sydney)
cd ../ap-southeast-2
terraform apply -var="name_prefix=reducto-au"
Each region requires its own IAM role with the region-specific Principal ARNs from the table above.

Multi-Environment Setup

For organizations with separate AWS accounts for dev/staging/prod:
environments/
├── dev/
│   ├── main.tf
│   └── terraform.tfvars
├── staging/
│   ├── main.tf
│   └── terraform.tfvars
└── prod/
    ├── main.tf
    └── terraform.tfvars
Each environment should use a separate Terraform state file and its own S3 bucket and IAM role. If the environments share the same Reducto org, Reducto can register them as named Hybrid VPC environments instead of separate orgs.

Multiple Buckets for One Reducto Org

For workflows that need client-specific buckets under one Reducto organization, register each bucket/role pair as a named environment:
{
  "default_environment": "client-a",
  "environments": {
    "client-a": {
      "region": "us-east-1",
      "bucket": "client-a-reducto-data",
      "role_arn": "arn:aws:iam::987654321098:role/reducto-client-a"
    },
    "client-b": {
      "region": "us-east-1",
      "bucket": "client-b-reducto-data",
      "role_arn": "arn:aws:iam::987654321098:role/reducto-client-b"
    }
  }
}
Then select the environment on each request:
{
  "input": "s3://client-b-reducto-data/documents/invoice.pdf",
  "settings": {
    "hybrid_vpc": {
      "environment": "client-b"
    }
  }
}

Security

ExternalId protection

The ExternalId in the IAM role trust policy prevents confused deputy attacks. Only requests with the correct ExternalId can assume the role.

Principle of least privilege

The IAM role grants only the permissions necessary for Reducto operations:
  • s3:GetObject — Read documents
  • s3:PutObject — Write results and artifacts
  • s3:DeleteObject — Clean up temporary files
  • s3:ListBucket — List objects for batch operations
  • s3:AbortMultipartUpload, s3:ListMultipartUploadParts — Handle large file uploads

Automatic data cleanup

Objects expire automatically based on the lifecycle configuration (default: 24 hours). This ensures no long-term data persistence, compliance with retention policies, and automatic cleanup of intermediate artifacts.