Learning joint reconstruction of hands and manipulated objects

Estimating hand-object manipulations is essential for interpreting andimitating human actions. Previous work has made significant progress towardsreconstruction of hand poses and object shapes in isolation. Yet,reconstructing hands and objects during manipulation is a more challenging taskdue to significant occlusions of both the hand and object. While presentingchallenges, manipulations may also simplify the problem since the physics ofcontact restricts the space of valid hand-object configurations. For example,during manipulation, the hand and object should be in contact but notinterpenetrate. In this work, we regularize the joint reconstruction of handsand objects with manipulation constraints. We present an end-to-end learnablemodel that exploits a novel contact loss that favors physically plausiblehand-object constellations. Our approach improves grasp quality metrics overbaselines, using RGB images as input. To train and evaluate the model, we alsopropose a new large-scale synthetic dataset, ObMan, with hand-objectmanipulations. We demonstrate the transferability of ObMan-trained models toreal data.