> ## Documentation Index
> Fetch the complete documentation index at: https://docs.reducto.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Hybrid VPC Deployment

> Deploy Reducto with your data in your own cloud account and compute managed by Reducto

Hybrid VPC deployment provides a balance between data sovereignty and operational simplicity. Your data stays in your cloud account while Reducto manages all compute infrastructure.

## Overview

In a Hybrid VPC deployment:

* **Data stays in your cloud account**: All documents, intermediate artifacts, and results are stored in your storage
* **Compute runs on Reducto's infrastructure**: GPU processing and model inference are handled by Reducto
* **Stateless by design**: Objects have a configurable lifecycle, ensuring no data persists beyond processing
* **Multiple storage providers**: AWS S3, Azure Blob Storage, and Box are supported

<CardGroup cols={2}>
  <Card title="AWS S3" icon="aws" href="/onprem/hybrid-vpc-aws">
    Cross-account IAM role with ExternalId protection. Optional PrivateLink for private-only API access.
  </Card>

  <Card title="Azure Blob Storage" icon="microsoft" href="/onprem/hybrid-vpc-azure">
    Cross-tenant service principal access with RBAC. Standard Azure security model.
  </Card>

  <Card title="Box" icon="box" href="/onprem/hybrid-vpc-box">
    Box enterprise app with Client Credentials Grant. Ideal for organizations already using Box for document management.
  </Card>

  <Card title="Google Cloud Storage" icon="google" href="/onprem/hybrid-vpc-gcs">
    Cross-project service account access. Standard GCP IAM model.
  </Card>
</CardGroup>

### Key benefits

| Benefit               | Description                                                |
| --------------------- | ---------------------------------------------------------- |
| Data sovereignty      | Storage remains in your cloud account                      |
| No GPU management     | Offload model inference to Reducto's optimized GPU cluster |
| Cost efficiency       | Avoid provisioning and maintaining GPU capacity            |
| Fast auto-scaling     | Scale to zero when idle, scale up on demand                |
| Reduced DevOps burden | Faster iteration, no infrastructure maintenance            |

## Architecture

```mermaid theme={null}
flowchart LR
    subgraph customer["Customer Cloud Account"]
        direction TB
        storage["Object Storage<br/>(S3 / Azure Blob / Box)"]
        auth["Access Credentials<br/>(IAM Role / Service Principal / Box App)"]
        auth --> storage
    end

    subgraph reducto["Reducto Infrastructure"]
        direction TB
        workers["Compute Workers"]
        api["Reducto API + Database"]
        workers --> api
    end

    workers <--> storage
```

### Data flow

1. You upload documents to your storage (or use Reducto's `/upload` endpoint)
2. You call Reducto API with a reference to your document
3. Reducto uses your configured credentials to access the document
4. Processing occurs on Reducto's compute infrastructure
5. Results and artifacts are written back to your storage
6. Objects expire automatically based on your lifecycle configuration

## Choosing a Storage Provider

<AccordionGroup>
  <Accordion title="AWS S3 — Recommended for AWS-native organizations">
    Best choice if your organization already uses AWS. Provides cross-account IAM role assumption with ExternalId protection against confused deputy attacks. Optional AWS PrivateLink keeps all traffic off the public internet. Terraform module provided for automated setup.
  </Accordion>

  <Accordion title="Azure Blob Storage — For Azure-native organizations">
    Best choice if your organization uses Azure. Uses cross-tenant service principal with RBAC role assignments. Terraform configuration provided for automated setup.
  </Accordion>

  <Accordion title="Box — For Box-first document workflows">
    Best choice if your organization already manages documents in Box. Uses Box enterprise app authentication (Client Credentials Grant). No Terraform provider available — setup is done through the Box Admin Console.
  </Accordion>

  <Accordion title="Google Cloud Storage — For GCP-native organizations">
    Best choice if your organization uses GCP. Uses cross-project service account access with IAM bindings. Contact Reducto for setup guidance.
  </Accordion>
</AccordionGroup>

## Document Handoff

There are multiple ways to provide documents to Reducto APIs, regardless of which storage provider you use:

<Tabs>
  <Tab title="Upload Endpoint">
    Use Reducto's `/upload` endpoint to upload documents directly. Files are automatically stored in your configured storage:

    ```python theme={null}
    from pathlib import Path
    from reducto import Reducto

    client = Reducto(api_key="your-api-key")

    upload_response = client.upload(file=Path("contract.pdf"))
    result = client.parse.run(document_url=upload_response.url)
    ```
  </Tab>

  <Tab title="Presigned / Shared URL">
    Generate a temporary URL from your storage provider and pass it to Reducto:

    ```python theme={null}
    from reducto import Reducto

    # Generate a presigned/shared URL from your storage provider
    # (S3 presigned URL, Azure SAS URL, or Box shared link)
    document_url = "https://..."

    client = Reducto(api_key="your-api-key")
    result = client.parse.run(document_url=document_url)
    ```
  </Tab>

  <Tab title="Direct URI (S3 only)">
    For AWS S3, pass an S3 URI directly. Reducto will use the configured IAM role to access the object:

    ```python theme={null}
    from reducto import Reducto

    client = Reducto(api_key="your-api-key")
    result = client.parse.run(document_url="s3://your-bucket/documents/contract.pdf")
    ```
  </Tab>
</Tabs>

<Note>
  For PrivateLink connections (AWS only), specify the region-specific hybrid endpoint as `base_url`:

  * **US**: `https://hybrid.platform.reducto.ai`
  * **EU**: `https://hybrid.eu.platform.reducto.ai`
  * **AU**: `https://hybrid.au.platform.reducto.ai`
</Note>

## Integration Contract

After setting up your storage infrastructure, provide the following values to Reducto:

<Tabs>
  <Tab title="AWS S3">
    | Value                     | Description                            |
    | ------------------------- | -------------------------------------- |
    | `bucket_name`             | S3 bucket name                         |
    | `region`                  | AWS region (e.g., `us-east-1`)         |
    | `role_arn`                | IAM role ARN for Reducto to assume     |
    | `external_id`             | ExternalId for secure role assumption  |
    | `privatelink_endpoint_id` | VPC Endpoint ID (if using PrivateLink) |
  </Tab>

  <Tab title="Azure">
    | Value                  | Description                              |
    | ---------------------- | ---------------------------------------- |
    | `storage_account_name` | Azure Storage account name               |
    | `container_name`       | Blob container name                      |
    | `connection_string`    | Storage connection string (or SAS token) |
  </Tab>

  <Tab title="Box">
    | Value           | Description                           |
    | --------------- | ------------------------------------- |
    | `client_id`     | Box app client ID                     |
    | `client_secret` | Box app client secret                 |
    | `enterprise_id` | Box enterprise ID                     |
    | `folder_id`     | Target folder ID for document storage |
  </Tab>
</Tabs>

## Multi-Region Setup

For organizations needing storage in multiple regions for latency or compliance requirements, see the provider-specific setup guides linked above. Each provider supports region-specific configurations that Reducto routes automatically based on the deployment area (US, EU, AU).

## Security

All storage integrations follow least-privilege principles:

* **AWS**: ExternalId prevents confused deputy attacks; IAM policy limits access to S3 operations only
* **Azure**: RBAC role assignment scoped to the specific storage account/container
* **Box**: App access restricted to the configured folder; enterprise admin approval required
* **All providers**: Automatic data cleanup via configurable lifecycle policies
