HyperAIHyperAI

Command Palette

Search for a command to run...

NVIDIA Launches New TensorRT Edge-LLM to Power Physical AI and Autonomous Driving Robots

NVIDIA has released an updated version of TensorRT Edge-LLM, a high-performance C++ inference runtime designed to power next-generation Physical AI. This update specifically targets the stringent power and latency requirements of embedded platforms like the NVIDIA DRIVE AGX Thor and NVIDIA Jetson Thor. The release addresses the critical challenge of running large language models and vision-language models on edge devices while enabling high-fidelity reasoning and real-time multimodal interaction. A key advancement in this release is the full support for Mixture of Experts (MoE) architectures at the edge. By optimizing models such as Qwen3 MoE, the runtime activates only a subset of parameters per token. This allows embedded devices to access the reasoning capabilities of massive models without incurring the latency or power consumption typically associated with their full size. Additionally, the update introduces specialized support for the Nemotron 2 Nano model, which utilizes a hybrid Mamba-2-Transformer architecture. This design significantly reduces memory footprints by minimizing KV cache storage while maintaining high precision, enabling complex retrieval-augmented generation and agentic workflows on resource-constrained hardware. For voice interaction, the runtime now supports Qwen3-TTS and Qwen3-ASR models. These native multimodal models employ a Thinker-Talker architecture to handle end-to-end speech processing directly on the chip. Unlike traditional pipelines that cascade separate automatic speech recognition, language, and text-to-speech models, this approach minimizes latency. This enables seamless, interruptible conversations between autonomous vehicle drivers and their cars, as well as natural dialogue for humanoid robots. In the realm of robotics, the update adds support for Cosmos Reason 2, an open reasoning vision-language model tailored for physical AI. This model allows embodied agents to understand world dynamics using physical common sense and chain-of-thought reasoning without requiring human annotations. By accelerating Cosmos Reason 2, TensorRT Edge-LLM enables robots to safely plan actions in real-time through complex, long-tail physical scenarios. Furthermore, the release facilitates the transition from traditional modular stacks to end-to-end Vision-Language-Action (VLA) models in autonomous driving. This includes preparation for the forthcoming Alpamayo 1 workflow, which aims to distill System 2 rational thinking onto the edge. Alpamayo 1 utilizes a Cosmos Reason Backbone to generate causation traces before outputting precise driving trajectories, moving beyond simple scene description to active planning. TensorRT Edge-LLM remains a pure C++ runtime that eliminates Python dependencies, ensuring predictable memory footprints essential for mission-critical automotive and robotics applications. Developers can access these new capabilities, including examples for MoE and the Alpamayo framework, through the updated GitHub repository or the latest NVIDIA DriveOS releases. This infrastructure provides the necessary foundation for building the next generation of autonomous machines capable of complex reasoning and real-time interaction.

Related Links