Database configuration

Connection architecture

Each Reducto pod (HTTP server or worker) maintains its own SQLAlchemy connection pool to PostgreSQL. Connections are not shared across pods.

Pod type	Processes per pod	Pool size per process	Max connections per pod
HTTP	8 gunicorn workers (default)	`pool_size` + `max_overflow`	8 × (4 + 8) = 96
Worker	1	`pool_size` + `max_overflow`	4 + 8 = 12

The number of HTTP workers is controlled by the HTTP_WORKERS environment variable (default: 8).

Default pool settings

These are the application-level defaults for on-premise deployments:

Setting	Default	Description
`DB_POOL_SIZE`	`4`	Persistent connections kept open per process
`DB_MAX_OVERFLOW`	`8`	Additional connections created under load (closed when idle)
`DB_POOL_TIMEOUT`	`30` (seconds)	Max time to wait for a connection from the pool
`DB_POOL_PRE_PING`	`true`	Validate connections before use (handles stale connections)
`DB_POOL_RECYCLE`	`-1` (disabled)	Max connection lifetime in seconds before recycling
`DB_LOCK_TIMEOUT_MS`	`15000`	PostgreSQL `lock_timeout` per transaction
`DB_STATEMENT_TIMEOUT_MS`	`20000`	PostgreSQL `statement_timeout` per transaction

All settings can be overridden via environment variables in your Helm values.

Connection pooling with PgBouncer or RDS Proxy

We strongly recommend running an external connection pooler between Reducto and PostgreSQL. Without one, connection storms during pod scaling (especially KEDA-driven autoscaling) can overwhelm the database.

Azure (built-in PgBouncer)

Azure Database for PostgreSQL Flexible Server includes a built-in PgBouncer. Our Azure on-prem Terraform module enables it by default:

variable "postgres_pgbouncer_enabled" {
  description = "Enable PgBouncer for built-in connection pooling"
  type        = bool
  default     = true
}

When enabled, the database URL automatically points to port 6432 (PgBouncer) instead of 5432 (direct PostgreSQL). No application-level changes are needed. To verify PgBouncer is active, check the Azure Portal under your PostgreSQL Flexible Server > Server parameters > pgbouncer.enabled.

AWS (RDS Proxy)

Our AWS on-prem Terraform module provisions an RDS Proxy by default. The Helm chart automatically uses the pooled database URL:

env:
  DATABASE_URL: <pooled_database_url via RDS Proxy>

RDS Proxy handles connection multiplexing, failover, and credential rotation transparently.

Estimating total database connections

To estimate your peak connection count:

Total connections = (HTTP pods × HTTP_WORKERS × (DB_POOL_SIZE + DB_MAX_OVERFLOW))
                  + (Worker pods × (DB_POOL_SIZE + DB_MAX_OVERFLOW))

Example with 2 HTTP pods and 10 worker pods (default settings):

HTTP:    2 × 8 × (4 + 8) = 192
Workers: 10 × (4 + 8)    = 120
Total:                      312

Azure Flexible Server and AWS RDS have connection limits based on instance size. With a connection pooler, the actual backend connections will be much lower than this number, since the pooler multiplexes idle application connections onto fewer database connections.

Right-sizing for your workload

If you see connection timeout errors or pool exhaustion:

Increase DB_POOL_SIZE if connections are frequently at capacity during steady state
Increase DB_MAX_OVERFLOW if you see spikes during burst traffic
Decrease DB_POOL_RECYCLE (e.g., 300) if you’re behind a pooler that has its own idle timeout — this prevents the application from trying to use connections the pooler has already closed

If you are using an external pooler (PgBouncer or RDS Proxy), keep DB_POOL_PRE_PING set to true. This ensures the application validates connections before use, which is important when the pooler may close idle backend connections.

Timeout tuning

The lock_timeout and statement_timeout values are set per transaction using SET LOCAL, which is compatible with all connection poolers (direct, PgBouncer, RDS Proxy). If you process very large documents (100+ pages) and see timeout errors, you may want to increase these:

env:
  DB_STATEMENT_TIMEOUT_MS: "30000"  # 30 seconds
  DB_LOCK_TIMEOUT_MS: "20000"       # 20 seconds

Keep statement_timeout higher than lock_timeout so that lock contention surfaces as a lock timeout rather than a generic statement timeout.

Documentation Index

​Connection architecture

​Default pool settings

​Connection pooling with PgBouncer or RDS Proxy

​Azure (built-in PgBouncer)

​AWS (RDS Proxy)

​Estimating total database connections

​Right-sizing for your workload

​Timeout tuning