HyperAIHyperAI

Command Palette

Search for a command to run...

NVIDIA Launches NeMo Retriever for General Agent Retrieval

NVIDIA's NeMo Retriever team announced that its newly developed intelligent retrieval pipeline has secured first place on the ViDoRe v3 leaderboard and achieved second position on the challenging BRIGHT reasoning benchmark. This achievement demonstrates the solution's significant advantages in versatility and adaptability. Traditional retrieval primarily relies on semantic similarity but often falls short when handling complex documents and deep reasoning tasks. To overcome this bottleneck, the NeMo team constructed an Agent-based retrieval pipeline grounded in the ReACT architecture. By establishing dynamic interaction loops between large language models and retrievers, the system autonomously plans searches, iterates through queries, and evaluates results rather than depending on single-shot inquiries. When agents encounter step limitations, the system automatically switches to Reciprocal Rank Fusion (RRF) as a safety fallback mechanism to ensure task completion. In terms of engineering implementation, the team abandoned traditional Model Context Protocol (MCP) server architectures in favor of thread-safe singleton retrievers. This improvement eliminated network transmission latency and process configuration complexity while substantially boosting GPU utilization and experimental throughput, enabling high-performance Agent retrieval to operate efficiently across large-scale benchmarks. Test data reveals exceptional generalization capabilities. In the ViDoRe v3 task focusing on complex visual layouts, NeMo surpassed competitors by achieving an NDCG@10 score of 69.22 points, whereas rival systems showed marked performance declines across other domains within the same dataset. Although single-query latency (~136 seconds) and costs exceed those of conventional dense retrieval methods, the Agent approach remains irreplaceable for complex logic and visual understanding scenarios. According to NVIDIA, future optimization efforts will prioritize distilling intricate reasoning patterns into smaller open-source models to reduce both latency and operational expenses. Currently supporting flexible configurations, the module allows developers to integrate various large language models with NVIDIA commercial embedding models to build highly versatile enterprise-grade retrieval workflows tailored to specific business needs.

Related Links

NVIDIA Launches NeMo Retriever for General Agent Retrieval | Trending Stories | HyperAI