Exploring the Inner Workings of a Distributed Vector Database: Key Insights and Architectural Choices for Scalable ANN Search
Integrating vector search into a semantic engine prototype revealed the complexities beyond simply achieving top-k accuracy. The real challenge lies in the infrastructure—scaling ingestion, building indexes efficiently, maintaining read consistency, and hitting latency targets under heavy loads. This led to an in-depth exploration of how open-source distributed vector databases, such as Milvus, manage these tasks effectively. Why a Vector DB Needs a New Architecture Traditional databases are designed for transactional workloads like key-value, document, and relational operations. However, vector databases are optimized for compute-intensive tasks such as similarity search and index building. Attempting to fit these tasks into monolithic OLTP systems results in suboptimal performance and poor parallelism. Consequently, vector databases require a specialized architecture. System Architecture of Milvus Milvus adopts a four-layer design that aligns well with distributed data engineering principles: Coordinator Layer: Manages overall coordination and distribution of tasks. Compute Layer: Handles the actual computation, divided into specialized services. Storage Layer: Manages data storage and retrieval. Access Layer: Facilitates interaction with the database through APIs and interfaces. Specialized Nodes The compute layer is further decomposed into three specialized nodes: QueryNode: Responsible for handling search queries. Adding more QueryNodes can significantly reduce latency under heavy loads. For instance, in tests with 50 million vectors and a 300 QPS load, adding two additional QueryNodes reduced 95th percentile latency from 780ms to around 420ms. DataNode: Critical for streaming applications, such as real-time document uploads or vector log ingestion. Isolating ingestion from search ensures that search performance remains unaffected by data writes. IndexNode: Manages index building, which can be CPU-intensive, especially with algorithms like Hierarchical Navigable Small World (HNSW) with high construction efficiency. Keeping index builds on separate nodes prevents them from impacting search latency. Segments: The Unit of Data and Isolation In Milvus, data is divided into immutable blocks called segments, each typically 512MB in size. These segments are independently indexed and searchable, which provides several benefits: Scalability: Smaller segments allow for more frequent indexing, improving recall for recent data without significantly increasing total query latency. Performance Tuning: Adjusting segment sizes can optimize performance for different use cases. For example, reducing the segment size from 512MB to 256MB on a dataset of 100 million vectors improved recall for recent data while maintaining stable query latency. Consistency Levels Vector databases like Milvus offer different levels of consistency, each with trade-offs: Strong Consistency: Suitable for admin panels and dashboards, ensuring the most up-to-date data but at the cost of higher latency. Bounded Staleness: Ideal for monitoring and stream replays, allowing some delay in reads for better performance. Session Consistency: Essential for interactive applications using Retrieval-Augmented Generation (RAG), ensuring users see their own writes in a session. Eventual Consistency: Best for offline processing and logs, where slight delays in data availability are acceptable. Indexing Is Not a Background Job Milvus treats indexing as a controlled process rather than a black-box batch job. This allows for: Deterministic Control: Critical for dynamic environments like e-commerce, where embedding distributions change frequently. Optimization: Tailoring indexing processes to specific use cases, enhancing overall efficiency and performance. Deployment Notes Key takeaways from testing included: Resource Allocation: Properly allocate resources to different nodes based on their specific workloads to avoid bottlenecks. Load Balancing: Implement effective load balancing to distribute queries evenly across QueryNodes. Monitoring and Maintenance: Regularly monitor and maintain the system to ensure optimal performance and prevent issues. Future Exploration Next steps include: Advanced Performance Tuning: Experimenting with various configurations to optimize performance further. Scalability Benchmarks: Testing the system's limits and identifying potential pitfalls. Real-World Use Cases: Applying the architecture to practical scenarios to understand its real-world effectiveness. If you're evaluating vector databases, it's essential to consider not only recall and latency but also the underlying architecture's ability to scale efficiently. A robust architecture is the foundation for successful high-dimensional search in real-world deployments. Industry Insights Industry experts highlight the growing importance of vector databases in the AI landscape. The ability to handle large volumes of high-dimensional data effectively is crucial for applications ranging from recommendation systems to natural language processing. Companies like Meta, Google, and Amazon are investing heavily in these technologies to stay competitive. Milvus, as an open-source solution, offers transparency and flexibility, making it a valuable tool for developers and researchers. Company Profile Milvus: Milvus is an open-source vector similarity search engine developed by Zilliz. It is designed to handle massive datasets and provide scalable and efficient search capabilities. Milvus supports a variety of indices and is widely used in industries requiring complex data analysis and retrieval, such as e-commerce, healthcare, and finance. Its modular and flexible architecture makes it adaptable to various use cases and environments.