2 months ago

Ego3DPose: Capturing 3D Cues from Binocular Egocentric Views

Kang, Taeho ; Lee, Kyungjin ; Zhang, Jinrui ; Lee, Youngki

Abstract

We present Ego3DPose, a highly accurate binocular egocentric 3D posereconstruction system. The binocular egocentric setup offers practicality andusefulness in various applications, however, it remains largely under-explored.It has been suffering from low pose estimation accuracy due to viewingdistortion, severe self-occlusion, and limited field-of-view of the joints inegocentric 2D images. Here, we notice that two important 3D cues, stereocorrespondences, and perspective, contained in the egocentric binocular inputare neglected. Current methods heavily rely on 2D image features, implicitlylearning 3D information, which introduces biases towards commonly observedmotions and leads to low overall accuracy. We observe that they not only failin challenging occlusion cases but also in estimating visible joint positions.To address these challenges, we propose two novel approaches. First, we designa two-path network architecture with a path that estimates pose per limbindependently with its binocular heatmaps. Without full-body informationprovided, it alleviates bias toward trained full-body distribution. Second, weleverage the egocentric view of body limbs, which exhibits strong perspectivevariance (e.g., a significantly large-size hand when it is close to thecamera). We propose a new perspective-aware representation using trigonometry,enabling the network to estimate the 3D orientation of limbs. Finally, wedevelop an end-to-end pose reconstruction network that synergizes bothtechniques. Our comprehensive evaluations demonstrate that Ego3DPoseoutperforms state-of-the-art models by a pose estimation error (i.e., MPJPE)reduction of 23.1% in the UnrealEgo dataset. Our qualitative results highlightthe superiority of our approach across a range of scenarios and challenges.