HyperAI

Boston Dynamics and the Toyota Research Institute (TRI) have announced a major breakthrough in robotics and artificial intelligence, marking a pivotal step toward general-purpose humanoid robots. Through their collaboration, the companies have integrated a Large Behavior Model (LBM)—a powerful AI system—into Boston Dynamics’ Atlas humanoid robot, enabling it to learn new skills directly from human demonstrations and perform complex, multi-step tasks in dynamic environments. Traditionally, robots have relied on hand-coded, task-specific instructions that are labor-intensive to develop and fragile in real-world settings. Even minor environmental changes can cause failure. Russ Tedrake, Senior Vice President of Large Behavior Models at TRI, highlighted the core challenge: “A key value proposition of humanoid robots is their ability to operate in human environments and perform diverse tasks, but conventional programming simply cannot scale to meet this demand.” The new LBM addresses this by allowing Atlas to acquire new skills through observation, reducing the need for extensive manual coding. As the model improves, it requires fewer human demonstrations to achieve robust behavior. This shift represents a fundamental change in how robots are designed and controlled. At the heart of the advancement is an end-to-end AI strategy that unifies perception, decision-making, and control into a single neural network. Unlike older approaches that treat walking, balancing, and manipulation as separate modules, this unified system controls Atlas’s entire body—including its arms and legs—enabling coordinated, full-body actions. In a demonstration video, Atlas performs a sequence of intricate tasks involving multiple objects and actions. It approaches a cart filled with parts from Spot, a quadruped robot, grasps a leg, folds it precisely, and places it on a nearby shelf. It then picks up other components, opens a drawer, and stores them inside. After clearing the cart, it turns to a large blue bin filled with disorganized parts, picks up handfuls, and transfers them to a second cart. The entire process—walking, grasping, folding, placing, opening drawers—flows seamlessly. What sets this performance apart is Atlas’s adaptability. During the demo, researchers introduced unexpected disruptions: a box lid was suddenly closed, and a part was dropped. Instead of freezing or failing, the robot responded intelligently—attempting to open the lid or bending down to retrieve the fallen piece. These recovery behaviors emerge not from pre-programmed contingency code, but from the model’s training on diverse, real-world scenarios that include such disruptions. The foundation of this capability lies in a data collection and training pipeline built around a virtual reality (VR) teleoperation system. Operators wear VR headsets to see the world through Atlas’s cameras and use controllers to guide the robot through tasks in real time. The system records high-fidelity data—including RGB video, proprioceptive information (robot joint states), and high-level language commands)—creating a rich, multimodal dataset. This data is used to train a 450-million-parameter neural network based on a diffusion transformer architecture. The model learns to map language instructions to coordinated physical actions, enabling autonomous task execution. Crucially, the team adopted a “generalist policy” approach. Rather than training separate models for each task, they combined data from multiple sources—including full Atlas robots, a torso-only test platform, and other TRI datasets—into a single, unified model. This enables the robot to generalize across tasks and objects, from rigid tools to soft fabrics and heavy tires, without reprogramming. This shared learning framework also accelerates development: improvements in one area can be transferred to others, and strategies can be applied across different robot platforms. An unexpected benefit emerged in performance: the trained model can execute actions 1.5 to 2 times faster than the original human demonstrations—without retraining. In some cases, the robot outperforms the human operator. The progress was made possible by a closed-loop development system combining high-fidelity simulation with physical testing. AI policies are first validated in simulation, allowing rapid iteration and reducing the risk of damage to expensive hardware. While this represents a significant leap, challenges remain. Scaling up the LBM approach depends on efficiently gathering vast amounts of high-quality training data—an ongoing hurdle. But the integration of human demonstrations, end-to-end learning, and generalization across tasks brings Atlas closer than ever to the vision of a truly adaptable, human-like robot.

Related Links

Related Links

Related Links

Beyond Visual Reality: Tsinghua WorldArena's New Evaluation System Reveals the Capability Gap in Embodied World Models

Beyond Visual Reality: Tsinghua WorldArena's New Evaluation System Reveals the Capability Gap in Embodied World Models

Command Palette

Atlas Robot Acquires New Skills from Humans Using Advanced AI Model

Related Links

Command Palette

Atlas Robot Acquires New Skills from Humans Using Advanced AI Model

Related Links

Command Palette

Atlas Robot Acquires New Skills from Humans Using Advanced AI Model

Related Links

Beyond Visual Reality: Tsinghua WorldArena's New Evaluation System Reveals the Capability Gap in Embodied World Models

Beyond Visual Reality: Tsinghua WorldArena's New Evaluation System Reveals the Capability Gap in Embodied World Models