Napier AI chip pushes token processing to 1,000/s, reshaping data-cent

Hardware Semiconductor Tensordyne’s 3nm Napier AI Chip Promises 13x Higher Token Throughput Than Blackwell & Blazes Past Rubin With 1000 Tokens/s In Multi-Trillion Parameter Models Hassan Mujtaba • at EDT Add on Google US-based AI company, Tensordyne, has announced the successful tape-out of its Napier chip, which it claims to demolish NVIDIA's Blackwell & Rubin chips with leading token throughput and efficiency. Tensordyne’s new Napier AI Chip arrives with one clear mission: to make NVIDIA’s Blackwell and Rubin chips look considerably less impressive The Napier chip will be the core component of the Tensordyne Napier TDN system, which is designed in collaboration with Broadcom and HPE Juniper Networks. The Napier platform has one goal: to unify AI through novel logarithmic AI math, a tightly integrated memory architecture, and a high-performance scale-up interconnect that drives higher token throughput at low power. Related Story FuriosaAI Ditches GPU Playbook For 2nm Broadcom-Built Inference Chip, Claims HBM4/E Bandwidth Beats Even The Most Efficient GPUsNapier is built on TSMC's 3nm process, and with its successful tape-out, the chip is now in production. With the primary milestone achievement, Tensordyne is now working towards beta deployment and a broader infrastructure plan that represents over $200 million in forecasted Napier system demand. And the key area of focus is AI inferencing. We just talked about how current AI infrastructure is constrained by power consumption, but to tackle these constraints, solutions such as 800V DC are going to incur a huge deployment cost. Infrastructures such as power and cooling alone make up 50% of the cost of major AI deployments, and to address these, Tensordyne has come up with a new inference stack across math, compute, memory, and networking: TDN Math (Logarithmic Mathematics) TDN replaces large-scale multiplication operations with simplified addition-based computation, significantly improving performance-per-watt efficiency across frontier AI models. TDN AIP (Artificial Intelligence Processor) Each TDN processor tightly integrates substantial fast SRAM alongside HBM memory, minimizing idle compute cycles and supporting efficient execution of the industry’s largest models. TDN Link (Any-to-Any Scale-Up Interconnect) Tensordyne’s proprietary scale-up fabric delivers sub-microsecond communication latency between processors, maximizing compute utilization and minimizing interconnect bottlenecks. All of this is brought together in Tensordyne's TDN72 Inference Pod and Rack system. Each Pod is fitted with 72 Napier AI chips, which are composed of NVIDIA's NVL72 rack with 72 Blackwell or Rubin GPUs. It requires way less infrastructure capacity, and a Napier Rack combines for TDN72 pods to deliver: 17x more tokens per watt (vs NVIDIA Blackwell) 13x more tokens per second (vs NVIDIA Blackwell) Up to $33 million more annual revenue per rack Tensordyne doesn't stop at just Blackwell comparison; they also compare the Napier solution against NVIDIA's upcoming Rubin platform. The company claims that its platform supports multi-trillion parameter models with a throughput of 1000 tokens/s per use in a single-rack configuration. To do the same, NVIDIA will require nine Rubin + Groq LPX racks. Tensordyne’s Napier platform represents a bold leap forward in AI inference. By delivering 17× more tokens per watt and 13× higher throughput than NVIDIA Blackwell, while matching the performance of nine Rubin-based racks in a single compact footprint, it shatters the traditional speed-versus-cost and power-versus-performance trade-offs. With dramatically lower infrastructure demands, up to $33 million more annual revenue per rack, and efficient scaling for multi-trillion parameter models, Napier doesn’t just compete with NVIDIA’s Blackwell & Rubin; it redefines what’s possible for next-generation AI deployment. About the : A Software Engineer by training and a PC enthusiast by passion, Hassan Mujtaba serves as 's for hardware section. With years of experience in the industry, he specializes in deep-dive technical analysis of next-generation CPU and GPU architectures, motherboards, and cooling solutions. His work involves not only breaking news on upcoming technologies but also extensive hands-on reviews and benchmarking. Follow on Google to get more of our news coverage in your feeds. Deal of the Day Further Reading 200MW of US-UAE’s Jointly Planned 5GW AI Campus Is Coming Online Soon, Powered by 1000s of Next-Gen Chips OpenAI Got The Whole AI Squad To Accelerate Large-Scale AI Training – AMD, NVIDIA, Intel, Microsoft & Broadcom All-In On MRC NVIDIA’s Nemotron 3 Super Tops The Open-Source AI Model Chart, Beating DeepSeek & GPT-OSS Meta Is Adding Tens of Millions of AWS Graviton Cores To Its Compute Portfolio As Agentic AI Becomes “Almost As Big a CPU Story As A GPU Story” Read all on Tensordyne’s 3nm Napier AI Chip Promises 13x Higher Token Throughput Than Blackwell & Blazes Past Rubin With 1000 Tokens/s In Multi-Trillion Parameter Models

Napier AI chip pushes token processing to 1,000/s, reshaping data-center economics

Key takeaways