> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Hybrid VPC Deployment

> Deploy Reducto with your data in your own cloud account and compute managed by Reducto

Hybrid VPC deployment provides a balance between data sovereignty and operational simplicity. Your data stays in your cloud account while Reducto manages all compute infrastructure.

## Overview

In a Hybrid VPC deployment:

* **Data stays in your cloud account**: All documents, intermediate artifacts, and results are stored in your storage
* **Compute runs on Reducto's infrastructure**: GPU processing and model inference are handled by Reducto
* **Stateless by design**: Objects have a configurable lifecycle, ensuring no data persists beyond processing
* **Multiple storage providers**: AWS S3, Azure Blob Storage, and Box are supported

<CardGroup cols={2}>
  <Card title="AWS S3" icon="aws" href="/onprem/hybrid-vpc-aws">
    Cross-account IAM role with ExternalId protection. Optional PrivateLink for private-only API access.
  </Card>

  <Card title="Azure Blob Storage" icon="microsoft" href="/onprem/hybrid-vpc-azure">
    Cross-tenant service principal access with RBAC. Standard Azure security model.
  </Card>

  <Card title="Box" icon="box" href="/onprem/hybrid-vpc-box">
    Box enterprise app with Client Credentials Grant. Ideal for organizations already using Box for document management.
  </Card>

  <Card title="Google Cloud Storage" icon="google" href="/onprem/hybrid-vpc-gcs">
    Cross-project service account access. Standard GCP IAM model.
  </Card>
</CardGroup>

### Key benefits

| Benefit               | Description                                                |
| --------------------- | ---------------------------------------------------------- |
| Data sovereignty      | Storage remains in your cloud account                      |
| No GPU management     | Offload model inference to Reducto's optimized GPU cluster |
| Cost efficiency       | Avoid provisioning and maintaining GPU capacity            |
| Fast auto-scaling     | Scale to zero when idle, scale up on demand                |
| Reduced DevOps burden | Faster iteration, no infrastructure maintenance            |

## Architecture

```mermaid theme={null}
flowchart LR
    subgraph customer["Customer Cloud Account"]
        direction TB
        storage["Object Storage<br/>(S3 / Azure Blob / Box)"]
        auth["Access Credentials<br/>(IAM Role / Service Principal / Box App)"]
        auth --> storage
    end

    subgraph reducto["Reducto Infrastructure"]
        direction TB
        workers["Compute Workers"]
        api["Reducto API + Database"]
        workers --> api
    end

    workers <--> storage
```

### Data flow

1. You upload documents to your storage (or use Reducto's `/upload` endpoint)
2. You call Reducto API with a reference to your document
3. Reducto uses your configured credentials to access the document
4. Processing occurs on Reducto's compute infrastructure
5. Results and artifacts are written back to your storage
6. Objects expire automatically based on your lifecycle configuration

## Choosing a Storage Provider

<AccordionGroup>
  <Accordion title="AWS S3 — Recommended for AWS-native organizations">
    Best choice if your organization already uses AWS. Provides cross-account IAM role assumption with ExternalId protection against confused deputy attacks. Optional AWS PrivateLink keeps all traffic off the public internet. Terraform module provided for automated setup.
  </Accordion>

  <Accordion title="Azure Blob Storage — For Azure-native organizations">
    Best choice if your organization uses Azure. Uses cross-tenant service principal with RBAC role assignments. Terraform configuration provided for automated setup.
  </Accordion>

  <Accordion title="Box — For Box-first document workflows">
    Best choice if your organization already manages documents in Box. Uses Box enterprise app authentication (Client Credentials Grant). No Terraform provider available — setup is done through the Box Admin Console.
  </Accordion>

  <Accordion title="Google Cloud Storage — For GCP-native organizations">
    Best choice if your organization uses GCP. Uses cross-project service account access with IAM bindings. Contact Reducto for setup guidance.
  </Accordion>
</AccordionGroup>

## Document Handoff

There are multiple ways to provide documents to Reducto APIs, regardless of which storage provider you use:

Document handoff is a trust boundary. Only trusted services should submit document URLs or uploads to Reducto. If a caller can submit arbitrary URLs, that caller can ask Reducto to fetch any network location reachable from the deployment or hybrid worker environment. Use your gateway, storage policy, and egress policy to limit what callers can request.

<Tabs>
  <Tab title="Upload Endpoint">
    Use Reducto's `/upload` endpoint to upload documents directly. Files are automatically stored in your configured storage:

    ```python theme={null}
    from pathlib import Path
    from reducto import Reducto

    client = Reducto(api_key="your-api-key")

    upload_response = client.upload(file=Path("contract.pdf"))
    result = client.parse.run(document_url=upload_response.url)
    ```
  </Tab>

  <Tab title="Presigned / Shared URL">
    Generate a temporary URL from your storage provider and pass it to Reducto:

    ```python theme={null}
    from reducto import Reducto

    # Generate a presigned/shared URL from your storage provider
    # (S3 presigned URL, Azure SAS URL, or Box shared link)
    document_url = "https://..."

    client = Reducto(api_key="your-api-key")
    result = client.parse.run(document_url=document_url)
    ```
  </Tab>

  <Tab title="Direct URI (S3 only)">
    For AWS S3, pass an S3 URI directly. Reducto will use the configured IAM role to access the object:

    ```python theme={null}
    from reducto import Reducto

    client = Reducto(api_key="your-api-key")
    result = client.parse.run(document_url="s3://your-bucket/documents/contract.pdf")
    ```
  </Tab>
</Tabs>

For production workflows:

* Prefer direct uploads, `reducto://` file IDs, or tightly scoped storage-provider URLs.
* Keep presigned URLs short-lived and scoped to a single object.
* Do not pass URLs containing long-lived credentials.
* Restrict worker egress to expected document sources when your workflow allows it.
* Block cloud metadata endpoints and internal admin services from document-fetching egress.

<Note>
  For PrivateLink connections (AWS only), specify the region-specific hybrid endpoint as `base_url`:

  * **US**: `https://hybrid.platform.reducto.ai`
  * **EU**: `https://hybrid.eu.platform.reducto.ai`
  * **AU**: `https://hybrid.au.platform.reducto.ai`
</Note>

## Integration Contract

After setting up your storage infrastructure, provide the following values to Reducto:

<Tabs>
  <Tab title="AWS S3">
    | Value                     | Description                                                              |
    | ------------------------- | ------------------------------------------------------------------------ |
    | `bucket_name`             | S3 bucket name                                                           |
    | `region`                  | AWS region (e.g., `us-east-1`)                                           |
    | `role_arn`                | IAM role ARN for Reducto to assume                                       |
    | `external_id`             | ExternalId for secure role assumption                                    |
    | `privatelink_endpoint_id` | VPC Endpoint ID (if using PrivateLink)                                   |
    | AWS account ID(s)         | Every AWS account that will create a VPC endpoint (if using PrivateLink) |
  </Tab>

  <Tab title="Azure">
    | Value                  | Description                              |
    | ---------------------- | ---------------------------------------- |
    | `storage_account_name` | Azure Storage account name               |
    | `container_name`       | Blob container name                      |
    | `connection_string`    | Storage connection string (or SAS token) |
  </Tab>

  <Tab title="Box">
    | Value           | Description                           |
    | --------------- | ------------------------------------- |
    | `client_id`     | Box app client ID                     |
    | `client_secret` | Box app client secret                 |
    | `enterprise_id` | Box enterprise ID                     |
    | `folder_id`     | Target folder ID for document storage |
  </Tab>
</Tabs>

## Multi-Region Setup

For organizations needing storage in multiple regions for latency or compliance requirements, see the provider-specific setup guides linked above. Each provider supports region-specific configurations that Reducto routes automatically based on the deployment area (US, EU, AU).

## Multiple Environments

If one Reducto organization needs multiple dedicated storage locations, Reducto can register named Hybrid VPC environments under the same org. Each environment points to one bucket, IAM role, and region. Then select the environment per request:

```python theme={null}
from reducto import Reducto

client = Reducto(api_key="your-api-key")

result = client.parse.run(
    document_url="s3://client-a-bucket/documents/contract.pdf",
    settings={
        "hybrid_vpc": {
            "environment": "client-a"
        }
    }
)
```

Use separate Reducto orgs only when you need separate API keys, admins, billing, quotas, or customer-visible tenancy. For client-specific buckets inside the same customer account, named environments are the recommended path.

## Security

All storage integrations follow least-privilege principles:

* **AWS**: ExternalId prevents confused deputy attacks; IAM policy limits access to S3 operations only
* **Azure**: RBAC role assignment scoped to the specific storage account/container
* **Box**: App access restricted to the configured folder; enterprise admin approval required
* **All providers**: Automatic data cleanup via configurable lifecycle policies
