SimpleEgo: Predicting Probabilistic Body Pose from Egocentric Cameras

Our work addresses the problem of egocentric human pose estimation fromdownwards-facing cameras on head-mounted devices (HMD). This presents achallenging scenario, as parts of the body often fall outside of the image orare occluded. Previous solutions minimize this problem by using fish-eye cameralenses to capture a wider view, but these can present hardware design issues.They also predict 2D heat-maps per joint and lift them to 3D space to deal withself-occlusions, but this requires large network architectures which areimpractical to deploy on resource-constrained HMDs. We predict pose from imagescaptured with conventional rectilinear camera lenses. This resolves hardwaredesign issues, but means body parts are often out of frame. As such, wedirectly regress probabilistic joint rotations represented as matrix Fisherdistributions for a parameterized body model. This allows us to quantify poseuncertainties and explain out-of-frame or occluded joints. This also removesthe need to compute 2D heat-maps and allows for simplified DNN architectureswhich require less compute. Given the lack of egocentric datasets usingrectilinear camera lenses, we introduce the SynthEgo dataset, a syntheticdataset with 60K stereo images containing high diversity of pose, shape,clothing and skin tone. Our approach achieves state-of-the-art results for thischallenging configuration, reducing mean per-joint position error by 23%overall and 58% for the lower body. Our architecture also has eight times fewerparameters and runs twice as fast as the current state-of-the-art. Experimentsshow that training on our synthetic dataset leads to good generalization toreal world images without fine-tuning.