HyperAIHyperAI

Command Palette

Search for a command to run...

NVIDIA Unveils VSS Blueprint 2.4: Enhancing Video Analytics with Generative AI, Reasoning, and Edge Deployment

The latest release of the NVIDIA AI Blueprint for Video Search and Summarization (VSS) 2.4 introduces powerful new capabilities that enhance the integration of computer vision pipelines with generative AI and reasoning. This update enables developers to build smarter, more context-aware video analytics systems that deliver real-time, actionable insights from both live and stored video streams. A key advancement is the integration of NVIDIA Cosmos Reason, a 7-billion-parameter vision language model designed for physical world reasoning. This model brings human-like understanding to AI agents by incorporating common sense, physics knowledge, and contextual awareness. It allows systems to analyze video footage not just by detecting objects, but by interpreting events, understanding relationships between entities, and performing root-cause analysis. With native support in VSS 2.4, Cosmos Reason is tightly coupled with the video ingestion pipeline, enabling efficient batching and reduced latency—especially beneficial for edge and cloud deployments. The model is also customizable and fine-tunable using proprietary data, making it adaptable to specific industry needs. VSS 2.4 significantly improves question answering through enhanced knowledge graph capabilities. The system now uses GPU-accelerated video ingestion to break down long videos into chunks, which are processed by Cosmos Reason to generate rich, descriptive captions. These captions are then used by a large language model to construct a knowledge graph that captures entities, events, and their relationships. A new post-processing step removes duplicate nodes and edges, ensuring that the same object—like a vehicle moving across multiple cameras—is represented as a single entity. This improves cross-camera tracking and enables accurate, context-aware answers to complex queries. To further boost accuracy, VSS 2.4 introduces agentic-based reasoning for knowledge graph traversal. An LLM agent can now decompose user questions, use tools to search the graph, reinspect video frames if needed, and iterate until it delivers a precise answer. This approach supports multi-stream queries, allowing users to ask questions that span multiple camera feeds—such as tracking a person’s path across a facility. The platform now supports two graph database backends: Neo4J and ArangoDB. ArangoDB offers CUDA-accelerated graph functions through cuGraph, speeding up knowledge graph generation and retrieval. These enhancements are ideal for multi-GPU environments handling large-scale video analytics. For low-latency edge use cases, the new VSS Event Reviewer feature enables selective, intelligent processing. Instead of running AI models continuously on all video data, Event Reviewer acts as an add-on to existing computer vision pipelines. When a CV system detects a potential event—like a collision or intrusion—the relevant video clip is sent to VSS for deeper analysis. VSS answers predefined yes/no questions and generates alerts, while also supporting follow-up natural language queries. This selective use of generative AI drastically reduces compute costs and frees up resources for other tasks. VSS 2.4 also expands hardware support to include NVIDIA Jetson Thor for edge deployments, NVIDIA RTX Pro 6000 workstations and servers, and upcoming support for NVIDIA DGX Spark. This flexibility allows developers to deploy the solution across a wide range of environments—from edge devices to data centers. To get started, developers can use an NVIDIA Brev Launchable for one-click deployment with pre-built Jupyter notebooks. The full codebase, documentation, and reference workflows are available on the NVIDIA-AI-Blueprints/video-search-and-summarization GitHub repository. For production environments and cloud service provider integration, detailed guidance is provided in the VSS documentation.

Related Links