NVIDIA Blackwell Tops MLPerf Training 6.0 With Fastest, Largest Results
NVIDIA’s Blackwell architecture has achieved complete dominance in MLPerf Training 6.0, securing top rankings across all performance categories and establishing new industry standards for AI model training infrastructure. As artificial intelligence systems grow in scale, the demands on training clusters have intensified, making benchmark performance a critical indicator of commercial viability. The latest peer-reviewed results confirm that NVIDIA’s rack-scale systems, comprising both the GB200 and next-generation GB300 NVL72 configurations, deliver the fastest and most reliable training environments available. This benchmark cycle introduced two new mixture-of-experts pretraining workloads, DeepSeek-V3 671B and GPT-OSS-20B, reflecting the shift toward sparse architectures. NVIDIA remained the sole vendor to submit across all seven categories, recording the fastest training times in each. A key driver is the integration of fifth-generation NVLink switches within the NVL72 platforms, which interconnect all 72 GPUs into a single high-bandwidth compute fabric. This architecture resolves the all-to-all communication bottlenecks inherent in large-scale training. Additionally, NVFP4 training methodology enables high-throughput pretraining while maintaining strict accuracy thresholds, as demonstrated by the 550-billion-parameter Nemotron 3 Ultra model. The upgraded GB300 system further advances this capability, delivering up to 1.6 times faster throughput than the GB200 through increased compute density and optimized power delivery. Scale and production resilience were equally emphasized. NVIDIA achieved its largest submission to date by scaling DeepSeek-V3 across 8,192 GPUs, alongside a 5,120-GPU deployment for Llama 3.1 405B. To sustain multi-week cycles, NVIDIA engineered its stack around proactive failure prevention and rapid recovery. The Reliability, Availability and Serviceability engine continuously monitors chip health and automatically reroutes around degraded components, while Spectrum-X Ethernet reroutes network traffic in milliseconds. The NVIDIA Resiliency Extension minimizes downtime by detecting underperforming nodes and resuming training from recent checkpoints rather than restarting entire jobs. The commercial impact is already evident across NVIDIA’s partner ecosystem. Nineteen organizations, including CoreWeave, Microsoft Azure, Google Cloud, and Nebius, leveraged Blackwell systems for demanding production workloads. CoreWeave’s deployments enabled Cohere to accelerate its agentic AI platform by three times, while Midjourney is expanding its Ultra fleet to train next-generation models. On Google Cloud, Thinking Machines Lab reported doubled speeds for frontier research, and Nebius facilitated a thirty percent reduction in training time for Higgsfield, a platform now processing millions of daily assets. These outcomes underscore Blackwell’s role as the foundational stack for frontier AI development, compressing iteration cycles and accelerating time-to-revenue.
