Data center workloads just got faster. A new GPU has surfaced with a 20 percent performance boost in AI training tasks compared to its predecessor, though whether that translates into real-world cooling savings is still unclear.
The chip, codenamed Blackridge, is the latest evolution of a long-running family known for pushing boundaries in floating-point math. Its core design shifts focus from raw clock speeds to more efficient parallel execution, a move that could reshape how AI clusters are built—but also raises questions about whether power budgets will need to be rethought.
Blackridge’s performance leap is most noticeable in mixed-precision training scenarios, where it delivers 20 percent better throughput than the previous generation. Under the hood, its CUDA cores now run at a base clock of 1.8 GHz with a boost capability up to 2.5 GHz, though sustained boosts are likely to be limited by thermal constraints in dense server racks.
- Performance: 20% faster AI training than predecessor
- CUDA cores: Base clock 1.8 GHz, boost up to 2.5 GHz
- Memory: 40 GB GDDR6X, 760 GB/s bandwidth
- TDP: 350W (estimated)
- Architecture: New instruction set for mixed-precision
The jump in memory capacity—from 24 GB to 40 GB GDDR6X—is a direct response to the growing needs of large-language model training, where batch sizes have swollen beyond what earlier chips could handle efficiently. But that extra bandwidth comes with a tradeoff: thermal output is estimated at 350 watts, up from 275 watts in the previous model. Whether data center operators will accept higher cooling costs for the performance gain remains an open question.
Blackridge’s instruction set introduces new mixed-precision ops that could further improve efficiency, but early benchmarks suggest these features may not fully offset the power draw in real-world deployments. The chip is expected to ship in volume later this year, though pricing has not been confirmed.