PostgreSQL, the open-source relational database, has long been a workhorse for enterprises, but few have pushed it as far as OpenAI. The company now runs ChatGPT and its API platform entirely on a single-primary PostgreSQL instance—no distributed cluster, no sharded architecture—processing millions of queries per second while maintaining five-nines availability and sub-10ms latency at the 99th percentile.
The setup is a direct rebuttal to the industry’s default playbook: when databases grow beyond a certain point, the advice is to shard, distribute, or migrate to a purpose-built distributed SQL system like CockroachDB. OpenAI’s approach flips that script. Instead of rewriting hundreds of endpoints or adopting new infrastructure, the team optimized PostgreSQL itself—connection pooling slashed connection time from 50ms to 5ms, and cache locking prevented ‘thundering herd’ problems where cache misses could overwhelm the system.
The workload itself is a key reason this works. ChatGPT’s primary database operations are read-heavy, with writes concentrated in specific areas. PostgreSQL’s multiversion concurrency control (MVCC) normally creates bottlenecks under heavy writes—copying entire rows for each update and forcing queries to scan through versions—but OpenAI structured its architecture around these tradeoffs. New write-heavy workloads default to sharded systems like Azure Cosmos DB, while existing read-oriented operations stay in PostgreSQL, aggressively optimized.
This isn’t a call to replicate OpenAI’s stack verbatim. The lesson is broader: scaling isn’t about blindly adopting distributed systems or sharding at the first sign of growth. It’s about understanding workload patterns, identifying real bottlenecks, and optimizing incrementally. For enterprises, this means reviewing ORM-generated SQL in production—a habit OpenAI adopted after discovering a 12-table join query that caused multiple outages during traffic spikes—and enforcing strict operational discipline, such as prohibiting schema changes that trigger full table rewrites.
OpenAI’s PostgreSQL deployment also highlights the hidden costs of premature distribution. Sharding and distributed databases eliminate single-writer bottlenecks but introduce complexity: application code must route queries correctly, distributed transactions become harder to manage, and operational overhead climbs. OpenAI’s hybrid strategy—migrating only the most problematic workloads while optimizing the rest—avoids these pitfalls.
For AI applications in particular, where read-heavy workloads with unpredictable spikes are common, PostgreSQL’s single-primary model can scale further than expected. The decision to shard should hinge on actual performance data, not user counts alone. OpenAI’s experience suggests that with deliberate optimization, proven systems can handle orders of magnitude more load than conventional wisdom assumes.
Key specs and optimizations
- Architecture: Single-primary PostgreSQL instance (Azure PostgreSQL Flexible Server) handling all writes.
- Read scaling: Nearly 50 read replicas across multiple regions.
- Performance: Millions of queries per second, sub-10ms p99 latency, five-nines availability.
- Optimizations: Connection pooling reduced latency from 50ms to 5ms; cache locking prevented ‘thundering herd’ issues.
- Workload isolation: New write-heavy workloads default to sharded systems (e.g., Azure Cosmos DB).
- Operational controls: Schema changes limited to lightweight updates; 5-second timeout for changes; aggressive rate limiting for backfilling.
- ORM review: Production monitoring of ORM-generated SQL to catch inefficient queries early.
This approach isn’t just about scale—it’s about operational pragmatism. OpenAI’s PostgreSQL deployment shows that when teams focus on real bottlenecks and optimize incrementally, even legacy systems can handle unprecedented loads without a full architectural overhaul.