Software treibt KI-Leistung stärker als Hardware voran
In a pivotal demonstration at GTC 2025, NVIDIA CEO Jensen Huang showcased Pareto frontier curves that vividly illustrated how software advancements are now outpacing hardware in driving AI performance gains. The curves plotted throughput (tokens per second) against response time (interactivity per user), revealing how optimizations in software stacks like Dynamo and TensorRT dramatically shift performance boundaries. Using Hopper H200 and Blackwell B200 GPUs, NVIDIA demonstrated that a rackscale Blackwell system—leveraging 72 B200s, FP4 precision, and advanced parallelism—delivered up to 25X higher performance per watt and per user compared to earlier H200-based setups. Even more striking was the 31X improvement measured directly from the chart, underscoring how software optimizations amplify hardware potential far beyond raw silicon advances. The performance leap wasn’t limited to dense models. When analyzing reasoning models—such as chain-of-thought architectures like GPT-OSS or DeepSeek R1—throughput per watt dropped significantly due to increased computational overhead. Yet, thanks to software refinements, the Blackwell system still achieved a 40X performance advantage over Hopper systems in key metrics. This shift highlights a critical trend: as AI models evolve toward more complex, multi-stage reasoning, software becomes the dominant lever for efficiency. NVIDIA’s InferenceMax v1 benchmark, which evaluates GPT-OSS 120B, DeepSeek R1-0528, and Llama 3.3 70B Instruct across various configurations, revealed an unprecedented pace of software-driven progress. Between August and October 2024, the Pareto frontier for the GPT-OSS model nearly doubled in performance across all points. Within weeks, enhancements to TensorRT—including optimized data parallelism across NVSwitch interconnects—pushed maximum throughput beyond 60,000 tokens per second per GPU and interactivity to nearly 500 TPS per user. Then, with the introduction of multi-token prediction (a form of speculative execution), the system achieved 1,000 TPS per user at peak interactivity and 5X higher throughput at typical workloads—delivering what once took two years in mere weeks. This acceleration underscores a paradigm shift: software now drives over 60% of performance gains per GPU generation, despite hardware accounting for 80% of NVIDIA’s revenue and only 20% of its workforce. Conversely, 80% of employees focus on software, reflecting its strategic centrality. The takeaway is clear—AI progress is no longer just about faster chips; it’s about smarter software that unlocks latent capabilities in existing hardware. Industry insiders emphasize that this pace of innovation demands continuous software updates. Firms that fail to keep up risk massive performance and cost inefficiencies. As one analyst noted, “In AI, the software is the new hardware.” The ability to rapidly iterate and optimize inference stacks is now a core competitive moat, with implications for cloud providers, enterprises, and model developers alike. NVIDIA’s recent feats exemplify a new era where the Pareto frontier isn’t just a static tradeoff curve—it’s a moving target, constantly reshaped by software.
