Microsoft Unveils Maia 200: A 750W AI Powerhouse Built for Next-Gen In

Microsoft Unveils Maia 200: A 750W AI Powerhouse Built for Next-Gen Inference

Microsoft has launched the Maia 200, an AI inference accelerator combining TSMC’s 3nm process with 216GB HBM3e memory and FP8/FP4 tensor cores—delivering three times the FP4 performance of Amazon’s Trainium while targeting Azure’s global AI workloads.

Microsoft has announced the Maia 200, a new AI accelerator designed to redefine inference economics for hyperscale workloads. Built on TSMC’s advanced 3nm process, the chip integrates 216GB of HBM3e memory and specialized tensor cores optimized for FP8 and FP4 precision, positioning it as the company’s most capable in-house silicon for AI tasks.

The Maia 200’s performance hinges on a combination of low-precision compute and a revamped memory subsystem. With over 10 petaFLOPS in FP4 and 5 petaFLOPS in FP8, the chip operates within a 750W TDP, balancing raw throughput with efficiency. Its 272MB of on-chip SRAM and a dedicated DMA engine reduce bottlenecks in data movement, a critical factor for large language models (LLMs) where token processing speed often lags behind compute capacity.

Key to its design is a two-tier scale-up network, leveraging standard Ethernet to connect accelerators without proprietary fabrics. Each Maia 200 exposes 2.8TB/s of bidirectional bandwidth, enabling clusters of up to 6,144 accelerators with predictable performance. This architecture simplifies deployment in Azure’s datacenters while reducing power and cost per inference operation.

Microsoft Unveils Maia 200: A 750W AI Powerhouse Built for Next-Gen Inference

Process: TSMC 3nm
Memory: 216GB HBM3e (7TB/s bandwidth)
On-chip SRAM: 272MB
Precision support: FP8/FP4 tensor cores
Performance: >10 petaFLOPS FP4, >5 petaFLOPS FP8
TDP: 750W
Bandwidth: 2.8TB/s bidirectional per accelerator
Networking: Ethernet-based scale-up fabric
Cluster scale: Up to 6,144 accelerators

The Maia 200 is already deployed in Microsoft’s US Central datacenter near Des Moines, Iowa, with US West 3 (Phoenix, Arizona) slated for rollout. It will support Azure’s AI infrastructure, including OpenAI’s GPT-5.2 models and Microsoft 365 Copilot, while also accelerating synthetic data generation for reinforcement learning pipelines.

Developers gain access to the Maia SDK, which includes PyTorch integration, a Triton compiler, and optimized kernels. The SDK’s low-level programming language allows fine-grained control for specialized workloads, ensuring compatibility across Microsoft’s heterogeneous AI hardware.

Unlike competitors relying on custom fabrics, the Maia 200’s Ethernet-based networking reduces complexity and cost. Its liquid cooling Heat Exchanger Unit and pre-silicon validation—from chip design to datacenter deployment—cut time-to-production by half compared to prior Microsoft AI projects.

For enterprises and researchers, the Maia 200 represents a leap in inference efficiency, particularly for LLMs where precision trade-offs are critical. Its integration with Azure ensures seamless scaling, while the SDK lowers barriers for custom model optimization. With deployment underway, Microsoft’s push into first-party AI silicon underscores its commitment to reducing reliance on third-party accelerators in the long term.

TECHOLAM

Microsoft Unveils Maia 200: A 750W AI Powerhouse Built for Next-Gen Inference

Key takeaways