NVIDIA Launches XR AI Beta to Build Intelligent AR and XR Agents
NVIDIA has launched NVIDIA XR AI in public beta, addressing a critical infrastructure gap for developers building artificial intelligence agents for augmented reality glasses, extended reality headsets, and wearable devices. While AR and XR hardware has matured, integrating live camera and microphone streams with multimodal AI models, enterprise data, and device-specific runtimes remains complex. The new open-source platform provides a modular foundation that connects these devices to GPU-accelerated AI services deployed across the cloud, data centers, workstations, and edge environments. The architecture centers on an XR Media Hub that routes live video, audio, and data messages to specialized model services. NVIDIA Cosmos models handle visual grounding, while Nemotron models manage voice recognition, language reasoning, and tool calling. Enterprise connectivity and database access are facilitated through the Model Context Protocol, which allows agents to retrieve maintenance records, research protocols, or operational data. A flexible orchestration layer, compatible with frameworks like the NVIDIA NeMo Agent Toolkit, manages complex workflows. The system is designed to minimize latency and bandwidth by keeping raw video pixels in shared memory and routing only lightweight metadata. Participant identity acts as a routing boundary, enabling multi-user and multi-agent scenarios where responses are accurately directed across different clients and agents. Industry partners are already exploring practical applications in hands-busy environments. Researchers at Stanford University and Princeton University are utilizing XR AI workflows to streamline stem cell therapy research, allowing scientists to access contextual information and interact with laboratory systems without interrupting complex procedures. In manufacturing, Siemens is piloting solutions that enable field engineers to retrieve real-time maintenance data, troubleshoot equipment, and verify work outcomes on the factory floor. These deployments highlight the platform’s potential to reduce manual documentation, accelerate troubleshooting, and capture operational evidence for compliance and training. The public beta release includes a comprehensive open-source repository and a streamlined implementation guide. Developers can deploy a functional multimodal agent in minutes, enabling real-time visual understanding, voice commands, and text-to-speech responses. Additional configuration steps allow integration of enterprise systems via MCP, advanced workflow orchestration, and spatial rendering through NVIDIA CloudXR. By decoupling media transport, model inference, and client delivery, the platform enables developers to swap components and adapt deployment environments without rebuilding entire applications. This release marks a significant advancement in bringing context-aware, enterprise-grade AI directly into physical workspaces, transforming how frontline workers and specialists interact with digital information in real time.
