Command Palette
Search for a command to run...
DragMesh-2: 関節付き物体との物理的に妥当な器用な手と物体の相互作用
DragMesh-2: 関節付き物体との物理的に妥当な器用な手と物体の相互作用
Tianshan Zhang Yijia Duan Yanjun Li Zeyu Zhang Hao Tang
概要
関節付き物体との巧みな相互作用は、家庭用、支援用、および人間型ロボットにおける操作において重要であり、多指ハンドは並行アゴ把持を超えた適応的な接触パターンを提供できる。しかし、関節付き物体の操作は静止物体の操作とは異なり、対象部品を直接駆動できず、その運動は持続的な物理的なハンドと取っ手の接触を通じて生じなければならない。これにより、物体中心の関節生成からハンド駆動型の巧みなハンド-物体相互作用への移行は非自明となる。なぜなら、幾何学的軌道の再生やオープンループ実行では、関節部品を動かすために必要な接触力学をモデル化できないからである。さらに、固定された力学条件下でのタスク完了のみを目的として学習されたポリシーは、特に触覚や力フィードバックがない場合、標準的な接触荷重に過適合し、接触荷重が変化すると性能が低下する可能性がある。これらの課題に対処するため、本稿ではDragMesh-2を提案する。これは関節付き物体との巧みな相互作用のための接触駆動型フレームワークであり、物体中心の生成から、物理的接触を通じて関節運動が生じなければならないハンド駆動型の巧みなハンド-物体相互作用へと、関節相互作用の範囲を拡張するものである。さらに、PICAを提案する。これは物理的知見に基づく接触 aware な学習メカニズムであり、触覚や力フィードバックを用いずにポリシー学習に物理信号を注入することで、変化する接触荷重下における頑健性とタスク成功率を向上させる。最後に、接触荷重変化下における頑健性を評価するため、複数の減衰条件および関節付き物体カテゴリにわたって体系的な評価を実施する。さらに、将来のロコモーション操作および人間型ロボットにおけるハンド-物体相互作用研究を支援するため、純粋な幾何情報に基づく巧みな相互作用リソースを提供する。7つのGAPartNet物体を用いた評価において、DragMesh-2は比較手法を上回る接触荷重変化下での頑健性を達成するとともに、すべての減衰条件において高いタスク成功率を維持している。
One-sentence Summary
DragMesh-2 presents a contact-driven framework for dexterous hand-object interaction with articulated objects that replaces open-loop trajectory replay with sustained physical contact, utilizes the PICA training mechanism to inject physical signals without tactile or force feedback, and demonstrates robust manipulation across seven GAPartNet objects under varying damping conditions.
Key Contributions
- DragMesh-2 is a contact-driven framework that transitions articulated object manipulation from object-centric trajectory generation to hand-driven dexterous interaction where motion emerges through sustained physical contact. This approach explicitly models the contact dynamics required to actuate articulated parts without relying on geometric replay or open-loop execution.
- PICA is a physically informed training mechanism that leverages short-horizon interaction history to inject physical signals into policy learning without tactile or force feedback. By integrating task rewards with separate action-bound and contact-preserving regularization terms, the method stabilizes multi-finger coordination under fluctuating contact loads.
- Systematic evaluations across seven GAPartNet objects and multiple damping conditions demonstrate that the framework achieves stronger robustness to contact-load variation than competing methods while maintaining high task success. The release of a pure-geometry dexterous interaction dataset further supports future loco-manipulation and humanoid hand-object interaction research.
Introduction
Dexterous manipulation of articulated objects is essential for advanced robotics applications like household assistants and humanoid systems, as multi-finger hands enable compliant contact patterns that traditional grippers cannot achieve. However, prior approaches struggle because articulated parts cannot be directly controlled and must move through sustained physical interaction. Existing methods typically rely on open-loop trajectory replay or reinforcement learning trained under fixed dynamics, which causes policies to overfit nominal contact loads and fail when interaction conditions change. To address these limitations, the authors introduce DragMesh-2, a contact-driven framework that generates articulated motion exclusively through real-time hand-handle interaction. They further propose PICA, a physically informed training mechanism that injects contact-aware signals and dynamics randomization into policy learning, enabling robust manipulation under varying loads without requiring tactile or force sensors.
Dataset
-
Dataset Composition and Sources: The authors generate a reference contact trajectory dataset directly from the GAPartNet geometry library using a purely heuristic approach. The collection comprises 277 trajectories distributed across seven GAPartNet categories, preserving the original category proportions and heavily featuring StorageFurniture objects.
-
Subset Details and Structure: Each trajectory follows a four-phase interaction sequence: approach, grasp, drag, and release. The generation process filters for objects with valid part, handle, and joint-mobility annotations, then combines these geometric cues with a SMPL-X hand model to ensure all wrist and finger motions align with the target joint constraints.
-
Data Processing and Storage: A geometry-guided synthesis procedure creates the dataset, with all outputs saved as JSON files containing per-frame wrist poses and finger configurations. This storage format decouples the data from any specific policy or physics backend, making the trajectories fully regenerable for any compatible GAPartNet model.
-
Usage in the Model: The authors utilize this unified dataset in three capacities within their DragMesh-2 framework. It initializes the expert grasp states and sets the target motion scales for the contact-driven reinforcement learning task, serves as a fixed non-learned baseline for trajectory tracking evaluation, and is publicly released as a geometry-only interaction resource for future loco-manipulation studies.
Experiment
The evaluation benchmarks learned contact-driven policies against trajectory replay and geometric primitives across multiple articulated objects under nominal, mild, and strong out-of-distribution damping conditions. Ablation and diagnostic studies validate that robust manipulation requires the synergistic combination of explicit physical regularization and temporal contact-response modeling, as nominal success metrics frequently mask underlying action saturation and stability collapse. Furthermore, experiments demonstrate that extended training or broader damping distributions alone cannot overcome strong-load failures, establishing that reliable checkpoint selection must prioritize out-of-distribution robustness and that sustained progress will depend on enriching the contact interface with direct force feedback.
The training-length study reveals that extending base policy optimization improves nominal performance but severely compromises robustness under strong damping conditions. As training epochs increase, the policy drives toward action saturation, causing a sharp decline in task success and progress when facing high contact loads. Applying physical-structure fine-tuning modules effectively mitigates this collapse, preserving high success rates and stable action distributions across both execution modes. Prolonged base training leads to severe performance collapse under strong damping as action saturation metrics approach their upper bounds. Combining physical fine-tuning modules maintains high success rates and prevents the action saturation that plagues longer-trained base policies. Both deterministic and stochastic evaluations show that fine-tuned variants outperform extended base policies in robustness and progress metrics.
The authors examine how training duration influences policy robustness under different contact load conditions. They find that while longer training improves success rates under nominal damping, it causes a sharp decline in performance when damping is increased. This performance drop correlates with rising action saturation, suggesting that extended training pushes the policy into a saturated state that lacks robustness. Extended training improves nominal performance but reduces robustness under strong damping. Action saturation increases consistently as training continues. Nominal success metrics can be misleading regarding a policy's ability to handle high contact loads.
The authors investigate the effects of extending fine-tuning duration on policy robustness under varying damping conditions. Results show that while nominal success rates remain high and training rewards increase, performance under higher damping levels degrades significantly. This indicates that prolonged training drives the policy into a saturated regime that compromises out-of-distribution stability. Extending fine-tuning leads to a collapse in robustness under higher damping conditions, despite maintaining high nominal success rates. Action saturation metrics increase steadily with training epochs, indicating a drift toward a saturated, low-robustness policy. Selecting checkpoints based solely on training reward is insufficient for out-of-distribution robustness, as the reward-best checkpoint exhibits degraded performance.
The authors evaluate contact-driven articulated object manipulation across varying damping levels and find that their proposed method consistently achieves the highest success rates compared to trajectory replay, geometric primitives, and standard learned baselines. Results demonstrate that combining physical structure signals with temporal encoders yields superior robustness under strong contact loads, whereas relying on nominal performance or extended training alone often leads to action saturation and performance collapse. The proposed method maintains the highest mean success across all damping multipliers and execution modes, outperforming open-loop and learned baselines. Temporal encoders alone are insufficient for robust contact control, as physical signals and temporal modeling must be combined to prevent saturation under strong damping. Nominal success metrics can be misleading, as extended training and broader damping ranges often degrade out-of-distribution robustness without explicit contact-aware regularization.
The authors evaluate whether expanding the training damping distribution enhances out-of-distribution robustness during fine-tuning. The results indicate that broadening the damping range fails to produce stable gains under strong damping conditions. Instead, this adjustment leads to noticeable performance degradation across nominal and mild damping settings. Expanding the training damping range fails to produce stable gains under strong damping conditions. The modified training distribution causes noticeable performance degradation across nominal and mild damping settings. Adjusting the training damping interval alone provides limited benefit within the current control framework.
The experiments evaluate the impact of extended training duration, prolonged fine-tuning, and expanded damping distributions on policy stability, revealing that longer optimization consistently degrades out-of-distribution robustness despite improving nominal success rates. This performance collapse is driven by action saturation, which renders reward-based checkpoint selection and broader training ranges ineffective for handling high contact loads. In contrast, integrating physical structure signals with temporal encoders successfully prevents saturation and maintains high success rates across all execution modes. Ultimately, the study demonstrates that explicit contact-aware regularization is essential for robust manipulation, as relying on extended training or nominal metrics alone compromises stability under strong damping.