HyperAIHyperAI

Command Palette

Search for a command to run...

NVIDIA Blackwell Tops AI Bench

NVIDIA Blackwell Ultra NVL72 Leads First Agentic AI Infrastructure Benchmark Artificial Analysis has published the initial results from AgentPerf, the industry’s first standardized benchmark for evaluating agentic artificial intelligence infrastructure. The data establishes the NVIDIA Blackwell Ultra NVL72 platform as the performance leader, highlighting a fundamental shift in how AI workloads are measured and deployed at scale. Agentic AI operates fundamentally differently from traditional conversational models. Whereas a single chat completion functions as an isolated request, an agent executes continuous workflows, chaining dozens to hundreds of language model calls, tool executions, and context updates until a complex objective is met. This multiplicative complexity introduces unique latency and throughput challenges that existing inference benchmarks cannot capture. AgentPerf addresses this gap by measuring simultaneous agent capacity, responsiveness, and energy efficiency under production conditions. The benchmark utilizes DeepSeek V4 Pro, a large mixture-of-experts model, and simulates real-world coding agent trajectories involving file reading, code generation, command execution, and iterative refinement. Tool calls are simulated using representative CPU processing times to isolate accelerator performance. Results indicate the NVIDIA GB300 NVL72 delivers up to 20 times more concurrent agents per megawatt compared to the NVIDIA HGX H200 system. Performance gains are maintained across strict service-level objectives of 20 and 60 tokens per second per agent. This efficiency advantage stems from extensive full-stack co-design. The GB300 NVL72 integrates 72 GPUs into a single rack-scale architecture, enabling efficient distribution of mixture-of-experts models. Optimized CUDA kernels overlap communication and computation, masking coordination latency. NVIDIA TensorRT LLM decouples input processing from output generation, sustaining throughput as concurrent agent sessions increase. The methodology was engineered specifically to reflect actual production behavior, providing enterprises with actionable metrics for infrastructure investment and power allocation. Early ecosystem adoption confirms the practical viability of these capabilities. Inference providers including Baseten, DeepInfra, and Together AI are already routing production agentic workloads to Blackwell infrastructure. Together AI leverages the platform for real-time inference powering Cursor, an agentic coding assistant. DeepInfra utilizes the architecture to deploy Pam.ai, an autonomous workforce solution for automotive service operations. As inference software and open-source frameworks mature, performance metrics for agentic workloads are projected to improve further. NVIDIA has confirmed that the Vera Rubin architecture has entered full production, establishing the next generation of computing capacity to support scaling demands. The publication of AgentPerf establishes a new industry standard for evaluating AI infrastructure, enabling developers and enterprises to make data-driven decisions as agentic applications transition to enterprise-critical systems.

Related Links