HyperAI

To effectively transfer human dexterity to robots, a research team from the Institute of Advanced Research (IAR) and its partners has developed and open-sourced the MANIPTRANS framework along with a comprehensive dataset. This innovative approach addresses the challenges of morphological differences and accumulated motion errors that typically hinder direct skill transfer between humans and robots. The core of the MANIPTRANS framework is divided into two stages. The first stage focuses on learning human hand movement patterns using large-scale MoCap data. Specifically, this phase only learns the motion trajectories of the human hands, temporarily ignoring their interaction with objects. By employing a robust machine learning algorithm—Proximal Policy Optimization (PPO)—to emphasize tracking precision and smoothness, the "motion imitator" can effectively mimic human hand movements, accurately translating them to robotic hands of different shapes. This initial stage significantly reduces the morphological discrepancy and provides a stable, natural motion foundation for subsequent tasks. However, while the first stage solves the "appearance" issue, the generated actions may still not satisfy physical laws or achieve effective object manipulation. To address this, the second stage introduces constraints from the physical world to realize precise and stable interactions. MANIPTRANS uses residual learning, focusing on training a "residual module" to adjust the initial motion imitations for high-precision and coordination. This module receives rich state information, including real-time object states (position, speed, shape, etc.) and significant simulation friction and contact forces, calculating necessary minor adjustments to the initial motion imitations. The final output retains the natural fluidity of the first stage while ensuring that the actions meet physical constraints, allowing the robot to firmly grasp objects, apply appropriate force, and even synchronize both hands. "When we continued improving the residual strategy, we successfully enabled the robotic hands to perform coordinated tasks: the left hand picks up a pen, the right hand holds the pen body, and naturally inserts the pen into the cap. This operation requires not only precise grip strength but also a high degree of coordination between the two hands. Everyone on the team felt extremely accomplished. This success proved the effectiveness of our MANIPTRANS method in overcoming transfer challenges during the process," recalled Xiaolin Li, a member of the research team. The successful design of the two-stage process lies in its ability to break down inherently complex learning problems into two relatively simpler sub-problems. By first establishing a solid motion foundation and then fine-tuning interactions, MANIPTRANS significantly reduces the dimensionality of the action space that needs exploration, markedly improving training efficiency and final performance. This makes MANIPTRANS highly effective in transferring complex skills, especially those involving bimanual operations, which are traditionally difficult to handle. Based on the MANIPTRANS framework, the research team constructed the DexManipNet dataset, which includes a variety of representative hand-object interaction data sets such as FAVOR and OakInk-V2. Currently, the dataset contains 3.3K segments of robotic hand operation videos, covering 1.2K different objects, totaling approximately 134 million frames. Among these, about 600 sequences involve complex bimanual tasks, spanning 61 different types of tasks like pen cap insertion, bottle opening, and chemical experiment operations. "DexManipNet stands out as the most extensive and diverse dataset currently supporting complex bimanual operations in the field of robotics. We have good reason to believe that based on this dataset, we can train multiple robotic manipulation models, achieving more versatile, dexterous, and high-precision bimanual operations in both virtual and real-world environments," Xiaolin Li stated. This development represents significant progress in robotics, making it possible to transfer intricate human skills to robots more efficiently and accurately. The open-source nature of MANIPTRANS and the DexManipNet dataset invites collaboration from the broader scientific community, potentially leading to more advanced and capable robotic systems in the future.

Related Links

Related Links

Related Links

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.

Command Palette

New Breakthrough in Dexterous Hand Skill Transfer: MANIPTRANS Framework Released

Related Links

Command Palette

New Breakthrough in Dexterous Hand Skill Transfer: MANIPTRANS Framework Released

Related Links

Command Palette

New Breakthrough in Dexterous Hand Skill Transfer: MANIPTRANS Framework Released

Related Links

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.