HyperAI

NVIDIA is shifting its focus toward small language models (SLMs) for agentic AI, arguing that they are better suited and more cost-effective for many tasks compared to large language models (LLMs). This move is driven by several practical considerations, including cost, latency, and operational overhead, as well as the specific constraints and requirements of LLMs, such as hosting needs and commercial commitments. In a recent paper, NVIDIA emphasizes that the industry's reliance on LLMs for all AI agent tasks is inefficient and often misguided. Instead, they propose a "data flywheel" approach where usage data is continuously analyzed and clustered to identify the most effective tools for specific sub-tasks. SLMs, they argue, are powerful enough for many of these tasks and offer significant advantages in terms of speed, resource efficiency, and cost. Current AI agent applications are often optimized to meet the demands of LLMs, a situation NVIDIA likens to the tail wagging the dog. By selecting models based on specific sub-tasks and continually refining them through real-world usage data, NVIDIA believes AI agents can be more aligned with their intended functions and achieve better performance. One of the key points NVIDIA makes is the economic and operational impact of moving even partially from LLMs to SLMs. The core of modern AI agents is currently dominated by very large language models, which are used to make strategic decisions, control task flows, and break down complex tasks into manageable sub-tasks. These LLMs communicate via centralized cloud infrastructure, which is designed to handle high volumes of diverse requests. However, NVIDIA argues that this generalized approach is both excessive and misaligned with the specific demands of most agentic use cases. SLMs, on the other hand, can deliver lower latency, require less memory and computational power, and significantly reduce operational costs while still performing adequately in constrained domains. For complex tasks, AI agents typically break them down into modular sub-tasks that can be managed by specialized or fine-tuned SLMs. This approach allows for the default use of SLMs, with LLMs being invoked only when necessary. With modern training, prompting, and agentic augmentation techniques, the focus shifts from the sheer size of the model to its capabilities, making SLMs a logical choice for many applications. Jason Droege, Scale AI’s current Chief Strategy Officer and interim CEO, noted, “This is one of the most sobering papers you will read. It brings much-needed sanity to the market.” NVIDIA’s proposal underscores the importance of aligning AI architecture with practical needs, ensuring that AI agents are both efficient and effective in their operations. In summary, NVIDIA’s stance is that small language models are the future of agentic AI, offering a more rational and sustainable approach to model selection and deployment. This shift could have a profound impact on the AI industry, making AI agents more accessible and cost-effective for a wide range of applications.

NVIDIA Advocates for Small Language Models to Drive Efficient and Cost-Effective Agentic AI Systems

Related Links