NVIDIA Launches New AI Models for Specialized Agents with Nemotron Vision and RAG Tools
NVIDIA has unveiled a new suite of open-source AI models under the Nemotron family, designed to power specialized agentic AI systems that combine reasoning, vision, retrieval, and safety. These models are built to support domain-specific workflows, real-world deployment, and compliance, with a focus on efficiency, accuracy, and openness. At the heart of the release are three key models: Nemotron Nano 3, a 32B parameter mixture-of-experts (MoE) model with only 3.6B active parameters, optimized for high throughput and low latency; Nemotron Nano 2 VL, a 12B multimodal model for document and video understanding; and Nemotron Parse 1.1, a 1B parameter model for precise document layout and table extraction. Nemotron Nano 3 is engineered for efficient agentic reasoning, excelling in scientific reasoning, coding, math, and tool-calling tasks. Its MoE architecture reduces compute costs while enabling deeper exploration and self-reflection. Nemotron Nano 2 VL stands out with state-of-the-art performance on OCRBenchV2, leveraging a hybrid Mamba-Transformer architecture for fast, accurate visual and textual reasoning. Trained on over 11 million high-quality samples, it supports tasks like image and video Q&A, dense captioning, and multi-image reasoning. A key innovation, Efficient Video Sampling (EVS), reduces redundant video tokens, boosting throughput by up to 2.5x without sacrificing accuracy. The model is available in FP4, FP8, and BF16 quantizations and runs efficiently via vLLM and TRT-LLM, with support for NVIDIA NIMs and integration into the AI Blueprint for video search and summarization. Nemotron Parse 1.1 delivers leading accuracy on PubTabNet for image-based table recognition, extracting structured text, tables, and layout metadata with bounding boxes and semantic classes. This enhances retrieval accuracy and improves training data quality for both LLMs and VLMs. For retrieval-augmented generation (RAG), NVIDIA introduces Nemotron RAG—a suite of models enabling secure, scalable access to proprietary data. It supports enterprise-grade pipelines for AI agents that plan, retrieve, and act autonomously, powering applications from IT support to customer service. The models lead on benchmarks like ViDoRe, MTEB, and MMTEB, making them ideal for high-performance RAG systems. Safety is critical in agentic AI, and the Llama 3.1 Nemotron Safety Guard 8B V3 addresses this with multilingual content moderation across 23 safety categories and nine languages. Fine-tuned on a culturally diverse dataset of 386K samples, it uses LLM-driven cultural adaptation and consistency filtering to detect harmful content with 84.2% accuracy and minimal latency. It runs on a single GPU and integrates with NeMo Guardrails for real-time, multilingual safety. NVIDIA also open-sourced the NeMo Evaluator SDK and NeMo Agent Toolkit, enabling reproducible benchmarking and automated optimization of agent workflows. The Agent Optimizer tunes hyperparameters for accuracy, latency, and groundedness, reducing trial-and-error in development. Developers can access all models and datasets on Hugging Face, with Nemotron Nano 2 VL available via inference providers like Baseten and Replicate. APIs are hosted on build.nvidia.com and OpenRouter. This release marks a major step toward building powerful, responsible, and efficient agentic AI systems—empowering developers to create intelligent, multimodal, and safe AI agents for real-world impact.
