> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Hybrid VPC — AWS S3

> Set up Hybrid VPC with AWS S3 storage and optional PrivateLink

This guide covers setting up Hybrid VPC with AWS S3 as your storage backend. Reducto assumes an IAM role in your AWS account to read and write documents.

## Prerequisites

* **AWS account(s)**: Can use separate accounts for development, staging, and production
* **Terraform 1.2+**: For infrastructure provisioning
* **Values from Reducto** (provided during onboarding):
  * Principal ARNs for Reducto's compute services
  * ExternalId for secure role assumption
  * Endpoint Service name and region (if using PrivateLink)

### Principal ARNs

Use the appropriate ARNs for your deployment region:

| Environment   | EKS Role ARN                                     | Modal User ARN                                 |
| ------------- | ------------------------------------------------ | ---------------------------------------------- |
| **Prod (US)** | `arn:aws:iam::731106932034:role/reducto-prod`    | `arn:aws:iam::731106932034:user/modal-prod`    |
| **Prod-EU**   | `arn:aws:iam::731106932034:role/reducto-prod-eu` | `arn:aws:iam::731106932034:user/modal-prod-eu` |
| **Prod-AU**   | `arn:aws:iam::731106932034:role/reducto-prod-au` | `arn:aws:iam::731106932034:user/modal-prod-au` |

### VPC Endpoint Service Configuration

If using PrivateLink, use the endpoint service closest to your region:

| Environment   | VPC Endpoint Service Name                                      | Region           | DNS Name                        |
| ------------- | -------------------------------------------------------------- | ---------------- | ------------------------------- |
| **Prod (US)** | `com.amazonaws.vpce.us-west-2.vpce-svc-0929182c8ed77b7a8`      | `us-west-2`      | `hybrid.platform.reducto.ai`    |
| **Prod-EU**   | `com.amazonaws.vpce.eu-central-1.vpce-svc-0a231d441f3a482a0`   | `eu-central-1`   | `hybrid.eu.platform.reducto.ai` |
| **Prod-AU**   | `com.amazonaws.vpce.ap-southeast-2.vpce-svc-0da3ceba709035c36` | `ap-southeast-2` | `hybrid.au.platform.reducto.ai` |

<Note>
  VPC endpoints support cross-region connections. You can create a VPC endpoint in your region that connects to any Reducto endpoint service above, regardless of your VPC's region.
</Note>

## Setup

<Steps>
  <Step title="Clone the infrastructure repository">
    ```bash theme={null}
    git clone https://github.com/reductoai-collab/reducto-hybrid-infra.git
    cd reducto-hybrid-infra
    ```
  </Step>

  <Step title="Create terraform.tfvars">
    ```hcl theme={null}
    name_prefix = "reducto"

    # Use the appropriate Principal ARNs for your region
    reducto_principal_arns = [
      "arn:aws:iam::731106932034:role/reducto-prod",
      "arn:aws:iam::731106932034:user/modal-prod"
    ]
    reducto_external_id = "<external-id-from-reducto>"

    # Optional: customize bucket name (auto-generated if not set)
    # bucket_name = "my-company-reducto-data"

    # Optional: customize object retention (default: 1 day)
    # lifecycle_expiration_days = 1

    tags = {
      Environment = "production"
      Project     = "reducto-hybrid"
    }
    ```
  </Step>

  <Step title="Initialize and apply">
    ```bash theme={null}
    terraform init
    terraform plan
    terraform apply
    ```
  </Step>

  <Step title="Share outputs with Reducto">
    ```bash theme={null}
    terraform output integration_values
    ```

    Example output:

    ```json theme={null}
    {
      "bucket_name": "reducto-data-a1b2c3d4",
      "region": "us-east-1",
      "role_arn": "arn:aws:iam::987654321098:role/reducto-access",
      "access_mode": "assume_role",
      "privatelink_endpoint_id": null
    }
    ```
  </Step>
</Steps>

### Components provisioned

| Component    | Purpose                                                     | Required |
| ------------ | ----------------------------------------------------------- | -------- |
| S3 Bucket    | Document and artifact storage with configurable lifecycle   | Yes      |
| IAM Role     | Cross-account access for Reducto with ExternalId protection | Yes      |
| VPC Endpoint | PrivateLink endpoint for private API access                 | Optional |

## Access Modes

<Tabs>
  <Tab title="Assume Role (recommended)">
    The default and recommended access mode. Reducto assumes an IAM role in your account with ExternalId protection.

    ```hcl theme={null}
    access_mode         = "assume_role"
    reducto_external_id = "your-external-id-from-reducto"
    ```

    **Benefits:**

    * ExternalId prevents confused deputy attacks
    * Fine-grained permission control
    * Easy credential rotation
  </Tab>

  <Tab title="Bucket Policy">
    Alternative mode that grants Reducto direct access via bucket policy. Simpler but without ExternalId protection.

    ```hcl theme={null}
    access_mode = "bucket_policy"
    ```
  </Tab>
</Tabs>

## PrivateLink Setup (Optional)

For private-only API access without traversing the public internet:

<Steps>
  <Step title="Request PrivateLink enablement">
    Provide the following to your Reducto team:

    * **AWS Account ID(s)**: Where you'll create the VPC endpoint
    * **Region(s)**: Where you need PrivateLink connectivity

    Reducto will enable the VPC Endpoint Service for your account(s). You'll receive confirmation to proceed.
  </Step>

  <Step title="Configure VPC endpoint">
    Add to your Terraform configuration:

    ```hcl theme={null}
    enable_privatelink              = true
    vpc_id                          = "vpc-0123456789abcdef0"
    subnet_ids                      = ["subnet-abc123", "subnet-def456"]
    reducto_endpoint_service_name   = "com.amazonaws.vpce.us-west-2.vpce-svc-0929182c8ed77b7a8"
    reducto_endpoint_service_region = "us-west-2"
    ```
  </Step>

  <Step title="Configure your client">
    Use the region-specific DNS name matching your VPC endpoint:

    ```python theme={null}
    from reducto import Reducto

    client = Reducto(
        api_key="your-api-key",
        base_url="https://hybrid.platform.reducto.ai"
    )
    ```
  </Step>
</Steps>

<Warning>
  You **must** enable private DNS resolution in your VPC endpoint configuration. This is required for the DNS alias to resolve correctly within your VPC.
</Warning>

## Validation Checklist

After `terraform apply`, verify your setup:

* [ ] **Terraform apply succeeded** without errors
* [ ] **S3 bucket has lifecycle rule**:
  ```bash theme={null}
  aws s3api get-bucket-lifecycle-configuration --bucket your-bucket-name
  ```
* [ ] **S3 bucket blocks public access**: All public access settings should be blocked
* [ ] **IAM role trust policy is correct**: Verify Reducto principals and ExternalId condition
  ```bash theme={null}
  aws iam get-role --role-name reducto-access --query 'Role.AssumeRolePolicyDocument'
  ```
* [ ] **If PrivateLink enabled**: Endpoint status shows "available"
  ```bash theme={null}
  aws ec2 describe-vpc-endpoints --vpc-endpoint-ids vpce-xxx
  ```
* [ ] **Smoke test**: Run a small Reducto job and verify objects appear in the bucket

## Multi-Region Setup

Deploy separate infrastructure in each region with region-specific Principal ARNs:

```bash theme={null}
# US East
cd environments/us-east-1
terraform apply -var="name_prefix=reducto-us"

# EU (Frankfurt)
cd ../eu-central-1
terraform apply -var="name_prefix=reducto-eu"

# Asia Pacific (Sydney)
cd ../ap-southeast-2
terraform apply -var="name_prefix=reducto-au"
```

Each region requires its own IAM role with the region-specific Principal ARNs from the table above.

## Multi-Environment Setup

For organizations with separate AWS accounts for dev/staging/prod:

```
environments/
├── dev/
│   ├── main.tf
│   └── terraform.tfvars
├── staging/
│   ├── main.tf
│   └── terraform.tfvars
└── prod/
    ├── main.tf
    └── terraform.tfvars
```

Each environment should use a separate Terraform state file, its own S3 bucket and IAM role, and be registered separately with Reducto.

## Security

### ExternalId protection

The ExternalId in the IAM role trust policy prevents [confused deputy attacks](https://docs.aws.amazon.com/IAM/latest/UserGuide/confused-deputy.html). Only requests with the correct ExternalId can assume the role.

### Principle of least privilege

The IAM role grants only the permissions necessary for Reducto operations:

* `s3:GetObject` — Read documents
* `s3:PutObject` — Write results and artifacts
* `s3:DeleteObject` — Clean up temporary files
* `s3:ListBucket` — List objects for batch operations
* `s3:AbortMultipartUpload`, `s3:ListMultipartUploadParts` — Handle large file uploads

### Automatic data cleanup

Objects expire automatically based on the lifecycle configuration (default: 24 hours). This ensures no long-term data persistence, compliance with retention policies, and automatic cleanup of intermediate artifacts.
