HyperAIHyperAI
4 months ago

Learning 3D Human Dynamics from Video

Kanazawa, Angjoo ; Zhang, Jason Y. ; Felsen, Panna ; Malik, Jitendra
Learning 3D Human Dynamics from Video
Abstract

From an image of a person in action, we can easily guess the 3D motion of theperson in the immediate past and future. This is because we have a mental modelof 3D human dynamics that we have acquired from observing visual sequences ofhumans in motion. We present a framework that can similarly learn arepresentation of 3D dynamics of humans from video via a simple but effectivetemporal encoding of image features. At test time, from video, the learnedtemporal representation give rise to smooth 3D mesh predictions. From a singleimage, our model can recover the current 3D mesh as well as its 3D past andfuture motion. Our approach is designed so it can learn from videos with 2Dpose annotations in a semi-supervised manner. Though annotated data is alwayslimited, there are millions of videos uploaded daily on the Internet. In thiswork, we harvest this Internet-scale source of unlabeled data by training ourmodel on unlabeled video with pseudo-ground truth 2D pose obtained from anoff-the-shelf 2D pose detector. Our experiments show that adding more videoswith pseudo-ground truth 2D pose monotonically improves 3D predictionperformance. We evaluate our model, Human Mesh and Motion Recovery (HMMR), onthe recent challenging dataset of 3D Poses in the Wild and obtainstate-of-the-art performance on the 3D prediction task without any fine-tuning.The project website with video, code, and data can be found athttps://akanazawa.github.io/human_dynamics/.