NVIDIA and Groq's Vera Rubin: A Potential Shift in AI Inference

GTC 2026HardwareIndustry NVIDIA Unveils Vera Rubin With Groq’s LPX to Break Into Inference, a Market Where It Has Never Been First Muhammad Zuhair • at EDT Add on Google Image NVIDIA NVIDIA's Groq partnership is now formalizing, as Jensen unveils a hybrid compute tray featuring Groq's third-generation LPU units in a Rubin rack. NVIDIA's Idea With Groq Is to Target 'High-Speed' Workloads, Hoping to Crack the Inference Competition The debate over what NVIDIA would do with Groq has been ongoing for quite some time, and we have maintained a key lead on developments. At GTC 2026, NVIDIA unveiled a new Vera Rubin hybrid compute tray, the Groq 3 LPX, which features eight of the 'unannounced' Groq3 units, which we'll discuss ahead. According to NVIDIA, LPX and Rubin together deliver unprecedented inference performance, enabling a 35x increase in inference throughput per megawatt, which is why Groq's solution was a key to NVIDIA unlocking the inference market. Related Story NVIDIA Sees Compute Revenue Exploding to $1 Trillion in Just Two Years, as AI Hits an ‘Inflection Point’ With InferenceImage NVIDIA As for the individual compute tray, we are looking at a rack with 256 units of LPUs, bringing in 128GB of on-chip SRAM and 640 TB/s of scale-up bandwidth. This is NVIDIA's answer to what Cerebras and competitors are doing in the realm of inference, and by essentially combining Rubin GPUs with LPUs, NVIDIA targets both the prefill and decode stages of inference, allowing the company to become competitive in a market where 'they aren't the first ones'. For an individual Groq3 chip, you are looking at 500 MB of SRAM, 150 TB/s of SRAM bandwidth, and 1.2 PFLOPs (FP8). When you combine Rubin and Groq's LPX tray, NVIDIA's CEO says that the total AI inference compute reaches up to 315 PFLOPs, and here's a close look at the inside of the tray: Optimized for trillion-parameter models and million-token context, the codesigned LPX architecture pairs with Vera Rubin to maximize efficiency across power, memory and compute. The additional throughput per watt and token performance unlocks a new tier of ultra-premium, trillion-parameter, million-context inference, expanding revenue opportunity for all AI providers. The idea is that Groq's LPU units will play a role similar to Mellanox's in networking, and that this hybrid architecture will give NVIDIA a head start on latency-sensitive workloads. With agentic AI becoming the next 'inflection' point for the industry, it is essential for NVIDIA to keep up with the compute demands, which is why Groq's partnership came at a vital time for Team Green. Follow on Google to get more of our news coverage in your feeds. Further Reading Intel Lands Inside NVIDIA’s DGX Rubin NVL8 Systems, With Xeon 6 Becoming the Mission-Critical Host CPU NVIDIA May Finally Abandon Its “One GPU Does Everything” Mantra at GTC 2026, and Here’s What to Expect NVIDIA’s CEO Says OpenClaw Did in 3 Weeks What Linux Took 30 Years to Achieve; Proof of How Big Agentic AI Really Is OpenAI Is Set to Be the Biggest Customer for the Upcoming NVIDIA-Groq AI Chip, Allocating 3GW of Dedicated ‘Inference Capacity’ Read all on NVIDIA Unveils Vera Rubin With Groq’s LPX to Break Into Inference, a Market Where It Has Never Been First

TECHOLAM

NVIDIA and Groq's Vera Rubin: A Potential Shift in AI Inference

Key takeaways