2 months ago

Vid2Avatar: 3D Avatar Reconstruction from Videos in the Wild via Self-supervised Scene Decomposition

Guo, Chen ; Jiang, Tianjian ; Chen, Xu ; Song, Jie ; Hilliges, Otmar

Abstract

We present Vid2Avatar, a method to learn human avatars from monocularin-the-wild videos. Reconstructing humans that move naturally from monocularin-the-wild videos is difficult. Solving it requires accurately separatinghumans from arbitrary backgrounds. Moreover, it requires reconstructingdetailed 3D surface from short video sequences, making it even morechallenging. Despite these challenges, our method does not require anygroundtruth supervision or priors extracted from large datasets of clothedhuman scans, nor do we rely on any external segmentation modules. Instead, itsolves the tasks of scene decomposition and surface reconstruction directly in3D by modeling both the human and the background in the scene jointly,parameterized via two separate neural fields. Specifically, we define atemporally consistent human representation in canonical space and formulate aglobal optimization over the background model, the canonical human shape andtexture, and per-frame human pose parameters. A coarse-to-fine samplingstrategy for volume rendering and novel objectives are introduced for a cleanseparation of dynamic human and static background, yielding detailed and robust3D human geometry reconstructions. We evaluate our methods on publiclyavailable datasets and show improvements over prior art.