HyperAIHyperAI

Command Palette

Search for a command to run...

Nvidia's Vera Rubin Platform Unveiled: Next-Gen AI and HPC Architecture with 144 GPUs, 288GB HBM4, and 3.6 ExaFLOPS Performance

Nvidia’s Vera Rubin platform represents the company’s most advanced and complex AI and high-performance computing (HPC) architecture to date, poised to redefine the limits of data center scalability and efficiency. Expected to launch in late 2025, the platform integrates nine distinct processors into a single, tightly coordinated rack-scale system designed for the most demanding generative AI and scientific computing workloads. At its core, Vera Rubin is built around a suite of custom silicon, including an 88-core Vera CPU, Rubin GPUs with 288 GB of HBM4 memory, Rubin CPX inference accelerators with 128 GB of GDDR7, NVLink 6.0 switch ASICs, BlueField-4 DPUs, and photonics-based networking components like Spectrum-6 Ethernet and Quantum-CX9 InfiniBand NICs and switches. A full NVL144 rack combines 144 Rubin GPUs (in 72 packages), 36 Vera CPUs, and 20,736 TB of HBM4 memory, delivering up to 3.6 NVFP4 ExaFLOPS for inference and 1.2 FP8 ExaFLOPS for training. The NVL144 CPX variant, optimized for inference, achieves nearly 8 NVFP4 ExaFLOPS by leveraging Rubin CPX accelerators, offering extreme compute density for large-scale language model serving. The Vera CPU, built on a new Armv9.2 architecture internally named Olympus, features 88 cores with 2-way SMT, enabling 176 threads. It uses a wide out-of-order pipeline and supports advanced extensions like SVE2, FP8/BF16, crypto, and tagging. Memory bandwidth reaches 1.2 TB/s—20% higher than Grace—thanks to LPDDR5X SOCAMM2 modules and a 1.8 TB/s bidirectional NVLink-C2C connection to GPUs, doubling Grace’s 900 GB/s. The CPU uses a multi-chiplet design with a visible I/O chiplet, suggesting advanced packaging and possibly external logic for I/O functions. The Rubin GPU, codenamed R200, uses two near-reticle-sized 3nm-class compute tiles from TSMC, paired with dedicated I/O dies and eight stacks of HBM4 memory totaling 288 GB and delivering 13 TB/s bandwidth. It promises 50 FP4 PetaFLOPS and 16 FP8 PetaFLOPS—3.3x and 1.6x improvements over Blackwell Ultra—driven by enhanced low-precision compute and optimized memory architecture. Power draw is estimated at 1.8 kW per GPU, which Nvidia says is manageable within existing cooling infrastructure like the Oberon rack. A follow-up Rubin Ultra platform, targeted for 2027, will double performance by adding two more compute tiles per GPU, reaching up to 100 PFLOPS per package and 1 TB of HBM4E memory with 32 TB/s bandwidth. This will require a new Kyber rack and liquid cooling due to a projected 3.6 kW power draw. The Rubin CPX GPU is a specialized inference accelerator designed to handle the memory-intensive prefill and context processing stages of large language models. With 128 GB of GDDR7, it provides a cost- and power-efficient alternative to HBM4, enabling efficient handling of million-token contexts and multi-modal inputs. It works alongside Rubin GPUs in the NVL144 CPX system, with Nvidia’s Dynamo inference orchestrator dynamically offloading workloads. The BlueField-4 DPU integrates a 64-core Grace CPU and handles networking, storage, security, and orchestration tasks, reducing CPU overhead and improving system efficiency. It runs DOCA, Nvidia’s software framework for data center automation and security. For connectivity, NVLink 6.0 doubles per-link throughput to 3.6 TB/s, enabling 28.8 TB/s total GPU-to-GPU bandwidth in the NVL144. NVLink 7.0 and NVSwitch 7.0 will follow in 2027, supporting higher port counts and larger-scale systems. Scale-out connectivity relies on co-packaged optics (CPO). Nvidia’s Spectrum-X Photonics Ethernet and Quantum-X Photonics InfiniBand platforms use TSMC’s COUPE technology to deliver up to 1.6 Tb/s per port. Quantum-X switches will offer up to 115 Tb/s fabric bandwidth, while Spectrum-X will support 100 Tb/s to 400 Tb/s configurations. The ConnectX-9 Spectrum-X SuperNIC, a 1.6 Tb/s network interface, enables zero-copy GPU-to-network transfers via GPUDirect Async and NIXL, reducing latency and CPU load. Nvidia’s Vera Rubin platform is not just a hardware upgrade—it’s a holistic system engineered to handle trillion-parameter agentic AI, multi-modal workloads, and massive-scale inference. It leverages new software features like the Interconnect Extension Layer, Smart Router, GPU Planner, and NCCL 2.24 to optimize performance, reduce latency, and enable efficient, disaggregated AI computing. The platform is set to become the foundation of the next generation of AI data centers.

Related Links