NVIDIA Unveils Three Neural Breakthroughs Enhancing Robot Learning in Simulation and Real-World Tasks
NVIDIA Research has unveiled three transformative neural breakthroughs that are advancing robot learning and enabling more capable, adaptable machines. These innovations, highlighted at CoRL 2025, address core challenges in simulating complex robot dynamics, teaching dexterous manipulation from human motion, and enabling precise bimanual assembly using vision and touch. The first advancement, NeRD (Neural Robot Dynamics), is a learned dynamics model that predicts the future states of robots under contact constraints. Unlike traditional simulators that struggle with high degrees of freedom and complex mechanics, NeRD uses a robot-centric, spatially invariant state representation to achieve high accuracy and data efficiency. Trained on just 100,000 random trajectories, NeRD leverages a lightweight GPT-2 Transformer architecture and integrates seamlessly with simulation frameworks like NVIDIA Warp. It achieves less than 0.1% error in accumulated reward over 1,000 timesteps and enables zero-shot sim-to-real transfer—demonstrated with a Franka robot arm. Fine-tuning on real-world data further closes the gap between simulation and reality, accelerating robotics research. The second breakthrough, Reference-Scoped Exploration (RSE), tackles the challenge of teaching robots human-level dexterity. Traditional methods rely on error-prone workflows involving retargeting, tracking, and correction. RSE replaces this with a unified, single-loop optimization that treats human motion-capture data as soft guidance rather than rigid ground truth. This allows robots to adapt movements to their own physical constraints while preserving the intent of the demonstration. A state-based imitation policy trained with RSE is then distilled into a vision-based generative control policy. This vision system uses a single-view depth image and sparse goals to perform diverse manipulation tasks—like picking up a banana, cup, or binoculars—achieving nearly 20% higher success rates than baselines on both Inspire and Allegro robotic hands. The third innovation, VT-Refine, enables precise bimanual assembly by combining vision and tactile feedback. Human assembly tasks often involve occlusions, requiring both visual and touch input. VT-Refine introduces a real-to-sim-to-real framework: it starts with a small set of real-world demonstrations (e.g., 30 episodes) to pretrain a visuo-tactile diffusion policy. This policy is then fine-tuned in a parallelized simulation environment using reinforcement learning, leveraging a digital twin with GPU-accelerated tactile simulation via TacSL. The system uses point clouds from an ego-centric camera, tactile sensor feedback, and joint positions to train robust policies. After RL fine-tuning, real-world success rates improve by about 20% for vision-only setups and 40% for visuo-tactile ones—despite a minor 5–10% drop in sim-to-real transfer, the gains are substantial. This marks one of the first successful large-scale sim-to-real transfers for bimanual visuo-tactile policies. Together, NeRD, RSE, and VT-Refine represent a leap forward in making robots more intelligent, adaptive, and capable in real-world environments. These tools provide developers with powerful, scalable techniques to train robots for complex, contact-rich tasks. They underscore NVIDIA’s commitment to advancing physical AI through simulation, data-driven learning, and multimodal perception. For more details, explore the research being presented at CoRL and Humanoids in Seoul, Korea, from September 27 to October 2. Join the 2025 BEHAVIOR Challenge, a benchmark for reasoning, locomotion, and manipulation featuring 50 household tasks and 10,000 tele-operated demonstrations. Stay updated by subscribing to the NVIDIA Robotics newsletter and following NVIDIA Robotics on YouTube, Discord, and the NVIDIA Developer Forums. Start your robotics journey with free courses on NVIDIA Robotics Fundamentals. Acknowledgments go to Arsalan Mousavian, Balakumar Sundaralingam, Binghao Huang, Dieter Fox, Eric Heiden, Iretiayo Akinola, Jie Xu, Liang-Yan Gui, Liuyu Bian, Miles Macklin, Rowland O’Flaherty, Sirui Xu, Wei Yang, Xiaolong Wang, Yashraj Narang, Yunzhu Li, Yu-Wei Chao, and Yu-Xiong Wang for their contributions.
