NVIDIA cuVS Boosts AI Search with GPU-Driven Indexing, Real-Time Performance, and Broader Ecosystem Support
NVIDIA has enhanced its cuVS library to optimize vector search for indexing and real-time retrieval, addressing the growing demand for high-performance AI systems. cuVS, designed for developers and data scientists, accelerates GPU-based vector search and clustering, enabling faster index creation, real-time updates, and scalable solutions for applications like retrieval-augmented generation (RAG), recommendation systems, and anomaly detection. The latest release introduces advanced indexing algorithms, expanded language support, and deeper integrations with major AI platforms. Key partnerships include Meta’s FAISS, Google Cloud’s AlloyDB and Vertex AI, Oracle, Milvus, Apache Lucene, Elasticsearch, OpenSearch, Weaviate, and Kinetica. These collaborations aim to streamline AI workflows by leveraging GPU acceleration for data processing while maintaining compatibility with existing CPU infrastructure. GPU-Driven Indexing Speeds cuVS now supports GPU-accelerated index building, with graph-based algorithms like Vamana (part of DiskANN) achieving up to 40x faster performance compared to CPU-based methods. NVIDIA is working with Microsoft to adapt these algorithms for GPU use, while Google Cloud AlloyDB has demonstrated a 9x speedup over CPU-based pgvector for HNSW indexing. Oracle’s integration with AI Vector Search in Oracle Database 23ai shows a 5x end-to-end improvement. Weaviate’s recent integration of cuVS using its GPU-native CAGRA method reduces index build times by 8x on GPUs, with seamless fallback to CPU-based HNSW search. Apache Lucene and Solr also benefit from cuVS, with index builds accelerating up to 40x and 6x, respectively. Elasticsearch plans to adopt cuVS for GPU-accelerated indexing, aiming for a 9.4x speedup. CPU-GPU Interoperability A critical feature of cuVS is its ability to create indexes on GPUs that can be used on CPU-based systems, reducing costs and leveraging existing infrastructure. For example, cuVS builds CAGRA graphs on GPUs and converts them to HNSW-compatible formats for CPU search, or processes DiskANN/Vamana indexes on GPUs before transferring them to CPUs. This interoperability is further enhanced through Meta’s FAISS library, which now accelerates CPU-based index builds by 12x and GPU indexes by 8x or more. New Python packages from FAISS include cuVS support. Efficiency Through Quantization cuVS incorporates binary and scalar quantization techniques to shrink vector storage footprints by 4x and 32x, respectively, while improving performance over CPU methods. Milvus has integrated CAGRA to build graphs directly on quantized vectors, enhancing efficiency for specialized vector databases. High-Throughput Search Improvements The dynamic batching API in cuVS reduces latencies for high-volume online search tasks by up to 10x. For applications like ad serving or trading pipelines, the CAGRA persistent search feature boosts throughput by 8x while maintaining low per-query delays. NVIDIA also improved the prefiltering capability of CAGRA, ensuring high recall even when 99% of vectors are excluded from results. Data Analysis and Scalability cuVS’s nn-descent algorithm now constructs kNN graphs out-of-core, allowing datasets larger than system RAM to be processed iteratively. This enables near-real-time exploratory data analysis at scale, a task previously impractical for CPUs. Tools like RAPIDS cuML and BERTopic leverage cuVS for tasks such as topic modeling and single-cell genomics, demonstrating its versatility across domains. Getting Started cuVS is available as a standalone library or through integrations with FAISS, Milvus, Weaviate, and others. Developers can access end-to-end examples and an automated tuning guide via the rapidsai/cuvs GitHub repository. cuVS Bench allows users to compare ANN search performance across GPU and CPU environments. By combining GPU acceleration with CPU compatibility, cuVS addresses bottlenecks in AI workflows, reducing costs and improving scalability. Its expanding ecosystem of partnerships and language support underscores its role in advancing AI-driven search and retrieval systems.