HyperAIHyperAI

Command Palette

Search for a command to run...

NVIDIA Unveils New Nemotron Models for Smarter, Safer AI Agents with Vision, RAG, and Guardrail Capabilities

NVIDIA has introduced a new suite of open models and tools designed to empower developers in building specialized, efficient, and safe AI agents. The latest additions to the Nemotron family include models for vision-language understanding, retrieval-augmented generation (RAG), and content safety, all backed by open datasets, detailed training recipes, and optimized inference support. At the core of the new offerings is Nemotron Nano 3, a 32-billion-parameter mixture-of-experts (MoE) model with only 3.6 billion active parameters at inference time. This efficient architecture delivers high throughput, improved accuracy in reasoning, coding, math, and tool use, and better self-reflection capabilities compared to dense models of similar size. The MoE design also reduces compute costs and latency, making it ideal for real-time agentic workflows. Nemotron Nano 2 VL, a 12-billion-parameter vision-language model, excels in document intelligence and video understanding. Trained on over 11 million high-quality samples from the Nemotron VLM Dataset V2, it achieves top performance on benchmarks like OCRBenchV2. Its hybrid Mamba-Transformer architecture enables fast token processing and low latency, while FP8 precision and context parallelism enhance efficiency for long inputs. A key innovation, Efficient Video Sampling (EVS), reduces redundant video tokens by identifying static frames, boosting throughput by up to 2.5x without sacrificing accuracy. The model supports FP4, FP8, and BF16 quantization and runs efficiently via vLLM and TRT-LLM, available as an NVIDIA NIM. Developers can use the NVIDIA AI Blueprint for video search and summarization, or NeMo to build custom datasets and models. For document processing, Nemotron Parse 1.1 is a compact 1-billion-parameter model that extracts structured text, tables, and layout information from images with high precision. It leads on the PubTabNet benchmark and delivers rich, semantically labeled outputs that improve downstream retrieval and training accuracy. To strengthen AI agents with reliable knowledge access, NVIDIA is releasing Nemotron RAG—a suite of models designed for enterprise-grade retrieval-augmented generation. These models ensure data privacy, support secure connections to internal data sources, and are optimized for real-time business applications. They power multi-agent systems, generative co-pilots for HR and IT, and intelligent summarization tools. The embedding models consistently rank among the best on industry benchmarks like ViDoRe, MTEB, and MMTEB, making them ideal for building robust RAG pipelines. Safety is critical in autonomous AI systems. The Llama 3.1 Nemotron Safety Guard 8B V3 is a multilingual content safety model trained on a culturally diverse dataset of over 386,000 samples across 23 safety categories and nine languages, including Arabic, Hindi, and Japanese. It detects harmful content in both prompts and responses with 84.2% accuracy and low latency. Two key innovations—LLM-driven cultural adaptation and consistency filtering—ensure the model understands local nuances and avoids misalignment. It’s lightweight enough to run on a single GPU and integrates seamlessly with NeMo Guardrails for real-time moderation. To help developers evaluate and optimize their agents, NVIDIA has open-sourced the NeMo Evaluator SDK, which enables reproducible benchmarking across static and dynamic workflows. The new ProfBench suite tests multi-step reasoning and tool usage. The NeMo Agent Toolkit, compatible with MCP, LangChain, CrewAI, and Semantic Kernel, includes an Agent Optimizer that automatically tunes hyperparameters like LLM type, temperature, and max tokens to balance accuracy, speed, and cost. All Nemotron models and datasets are available on Hugging Face. Additional deployment options are provided through inference providers like Baseten, Deep Infra, and Replicate. Developers can also test NVIDIA-hosted API endpoints on build.nvidia.com and OpenRouter. For the latest updates, follow NVIDIA AI on LinkedIn, X, Discord, and YouTube.

Related Links