Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt

Use this file to discover all available pages before exploring further.

This guide covers setting up Hybrid VPC with AWS S3 as your storage backend. Reducto assumes an IAM role in your AWS account to read and write documents.

Prerequisites

  • AWS account(s): Can use separate accounts for development, staging, and production
  • Terraform 1.2+: For infrastructure provisioning
  • Values from Reducto (provided during onboarding):
    • Principal ARNs for Reducto’s compute services
    • ExternalId for secure role assumption
    • Endpoint Service name and region (if using PrivateLink)

Principal ARNs

Use the appropriate ARNs for your deployment region:
EnvironmentEKS Role ARNModal User ARN
Prod (US)arn:aws:iam::731106932034:role/reducto-prodarn:aws:iam::731106932034:user/modal-prod
Prod-EUarn:aws:iam::731106932034:role/reducto-prod-euarn:aws:iam::731106932034:user/modal-prod-eu
Prod-AUarn:aws:iam::731106932034:role/reducto-prod-auarn:aws:iam::731106932034:user/modal-prod-au

VPC Endpoint Service Configuration

If using PrivateLink, use the endpoint service closest to your region:
EnvironmentVPC Endpoint Service NameRegionDNS Name
Prod (US)com.amazonaws.vpce.us-west-2.vpce-svc-0929182c8ed77b7a8us-west-2hybrid.platform.reducto.ai
Prod-EUcom.amazonaws.vpce.eu-central-1.vpce-svc-0a231d441f3a482a0eu-central-1hybrid.eu.platform.reducto.ai
Prod-AUcom.amazonaws.vpce.ap-southeast-2.vpce-svc-0da3ceba709035c36ap-southeast-2hybrid.au.platform.reducto.ai
VPC endpoints support cross-region connections. You can create a VPC endpoint in your region that connects to any Reducto endpoint service above, regardless of your VPC’s region.

Setup

1

Clone the infrastructure repository

git clone https://github.com/reductoai-collab/reducto-hybrid-infra.git
cd reducto-hybrid-infra
2

Create terraform.tfvars

name_prefix = "reducto"

# Use the appropriate Principal ARNs for your region
reducto_principal_arns = [
  "arn:aws:iam::731106932034:role/reducto-prod",
  "arn:aws:iam::731106932034:user/modal-prod"
]
reducto_external_id = "<external-id-from-reducto>"

# Optional: customize bucket name (auto-generated if not set)
# bucket_name = "my-company-reducto-data"

# Optional: customize object retention (default: 1 day)
# lifecycle_expiration_days = 1

tags = {
  Environment = "production"
  Project     = "reducto-hybrid"
}
3

Initialize and apply

terraform init
terraform plan
terraform apply
4

Share outputs with Reducto

terraform output integration_values
Example output:
{
  "bucket_name": "reducto-data-a1b2c3d4",
  "region": "us-east-1",
  "role_arn": "arn:aws:iam::987654321098:role/reducto-access",
  "access_mode": "assume_role",
  "privatelink_endpoint_id": null
}

Components provisioned

ComponentPurposeRequired
S3 BucketDocument and artifact storage with configurable lifecycleYes
IAM RoleCross-account access for Reducto with ExternalId protectionYes
VPC EndpointPrivateLink endpoint for private API accessOptional

Access Modes

For private-only API access without traversing the public internet:
1

Request PrivateLink enablement

Provide the following to your Reducto team:
  • AWS Account ID(s): Where you’ll create the VPC endpoint
  • Region(s): Where you need PrivateLink connectivity
Reducto will enable the VPC Endpoint Service for your account(s). You’ll receive confirmation to proceed.
2

Configure VPC endpoint

Add to your Terraform configuration:
enable_privatelink              = true
vpc_id                          = "vpc-0123456789abcdef0"
subnet_ids                      = ["subnet-abc123", "subnet-def456"]
reducto_endpoint_service_name   = "com.amazonaws.vpce.us-west-2.vpce-svc-0929182c8ed77b7a8"
reducto_endpoint_service_region = "us-west-2"
3

Configure your client

Use the region-specific DNS name matching your VPC endpoint:
from reducto import Reducto

client = Reducto(
    api_key="your-api-key",
    base_url="https://hybrid.platform.reducto.ai"
)
You must enable private DNS resolution in your VPC endpoint configuration. This is required for the DNS alias to resolve correctly within your VPC.

Validation Checklist

After terraform apply, verify your setup:
  • Terraform apply succeeded without errors
  • S3 bucket has lifecycle rule:
    aws s3api get-bucket-lifecycle-configuration --bucket your-bucket-name
    
  • S3 bucket blocks public access: All public access settings should be blocked
  • IAM role trust policy is correct: Verify Reducto principals and ExternalId condition
    aws iam get-role --role-name reducto-access --query 'Role.AssumeRolePolicyDocument'
    
  • If PrivateLink enabled: Endpoint status shows “available”
    aws ec2 describe-vpc-endpoints --vpc-endpoint-ids vpce-xxx
    
  • Smoke test: Run a small Reducto job and verify objects appear in the bucket

Multi-Region Setup

Deploy separate infrastructure in each region with region-specific Principal ARNs:
# US East
cd environments/us-east-1
terraform apply -var="name_prefix=reducto-us"

# EU (Frankfurt)
cd ../eu-central-1
terraform apply -var="name_prefix=reducto-eu"

# Asia Pacific (Sydney)
cd ../ap-southeast-2
terraform apply -var="name_prefix=reducto-au"
Each region requires its own IAM role with the region-specific Principal ARNs from the table above.

Multi-Environment Setup

For organizations with separate AWS accounts for dev/staging/prod:
environments/
├── dev/
│   ├── main.tf
│   └── terraform.tfvars
├── staging/
│   ├── main.tf
│   └── terraform.tfvars
└── prod/
    ├── main.tf
    └── terraform.tfvars
Each environment should use a separate Terraform state file, its own S3 bucket and IAM role, and be registered separately with Reducto.

Security

ExternalId protection

The ExternalId in the IAM role trust policy prevents confused deputy attacks. Only requests with the correct ExternalId can assume the role.

Principle of least privilege

The IAM role grants only the permissions necessary for Reducto operations:
  • s3:GetObject — Read documents
  • s3:PutObject — Write results and artifacts
  • s3:DeleteObject — Clean up temporary files
  • s3:ListBucket — List objects for batch operations
  • s3:AbortMultipartUpload, s3:ListMultipartUploadParts — Handle large file uploads

Automatic data cleanup

Objects expire automatically based on the lifecycle configuration (default: 24 hours). This ensures no long-term data persistence, compliance with retention policies, and automatic cleanup of intermediate artifacts.