HyperAI

NVIDIA has launched NVIDIA XR AI in public beta, addressing a critical infrastructure gap for developers building artificial intelligence agents for augmented reality glasses, extended reality headsets, and wearable devices. While AR and XR hardware has matured, integrating live camera and microphone streams with multimodal AI models, enterprise data, and device-specific runtimes remains complex. The new open-source platform provides a modular foundation that connects these devices to GPU-accelerated AI services deployed across the cloud, data centers, workstations, and edge environments. The architecture centers on an XR Media Hub that routes live video, audio, and data messages to specialized model services. NVIDIA Cosmos models handle visual grounding, while Nemotron models manage voice recognition, language reasoning, and tool calling. Enterprise connectivity and database access are facilitated through the Model Context Protocol, which allows agents to retrieve maintenance records, research protocols, or operational data. A flexible orchestration layer, compatible with frameworks like the NVIDIA NeMo Agent Toolkit, manages complex workflows. The system is designed to minimize latency and bandwidth by keeping raw video pixels in shared memory and routing only lightweight metadata. Participant identity acts as a routing boundary, enabling multi-user and multi-agent scenarios where responses are accurately directed across different clients and agents. Industry partners are already exploring practical applications in hands-busy environments. Researchers at Stanford University and Princeton University are utilizing XR AI workflows to streamline stem cell therapy research, allowing scientists to access contextual information and interact with laboratory systems without interrupting complex procedures. In manufacturing, Siemens is piloting solutions that enable field engineers to retrieve real-time maintenance data, troubleshoot equipment, and verify work outcomes on the factory floor. These deployments highlight the platform’s potential to reduce manual documentation, accelerate troubleshooting, and capture operational evidence for compliance and training. The public beta release includes a comprehensive open-source repository and a streamlined implementation guide. Developers can deploy a functional multimodal agent in minutes, enabling real-time visual understanding, voice commands, and text-to-speech responses. Additional configuration steps allow integration of enterprise systems via MCP, advanced workflow orchestration, and spatial rendering through NVIDIA CloudXR. By decoupling media transport, model inference, and client delivery, the platform enables developers to swap components and adapt deployment environments without rebuilding entire applications. This release marks a significant advancement in bringing context-aware, enterprise-grade AI directly into physical workspaces, transforming how frontline workers and specialists interact with digital information in real time.

Related Links

Related Links

Related Links

Online Tutorial | UC Berkeley/NVIDIA and Others Release Gsplat, an open-source 3DGS Library That Saves 4x GPU Memory and Reduces Training Time by 10%.

Online Tutorial | UC Berkeley/NVIDIA and Others Release Gsplat, an open-source 3DGS Library That Saves 4x GPU Memory and Reduces Training Time by 10%.

Command Palette

NVIDIA Launches XR AI Beta to Build Intelligent AR and XR Agents

Related Links

Command Palette

NVIDIA Launches XR AI Beta to Build Intelligent AR and XR Agents

Related Links

Command Palette

NVIDIA Launches XR AI Beta to Build Intelligent AR and XR Agents

Related Links

Online Tutorial | UC Berkeley/NVIDIA and Others Release Gsplat, an open-source 3DGS Library That Saves 4x GPU Memory and Reduces Training Time by 10%.

Online Tutorial | UC Berkeley/NVIDIA and Others Release Gsplat, an open-source 3DGS Library That Saves 4x GPU Memory and Reduces Training Time by 10%.