Human-VDM: Learning Single-Image 3D Human Gaussian Splatting from Video Diffusion Models

Generating lifelike 3D humans from a single RGB image remains a challengingtask in computer vision, as it requires accurate modeling of geometry,high-quality texture, and plausible unseen parts. Existing methods typicallyuse multi-view diffusion models for 3D generation, but they often faceinconsistent view issues, which hinder high-quality 3D human generation. Toaddress this, we propose Human-VDM, a novel method for generating 3D human froma single RGB image using Video Diffusion Models. Human-VDM provides temporallyconsistent views for 3D human generation using Gaussian Splatting. It consistsof three modules: a view-consistent human video diffusion module, a videoaugmentation module, and a Gaussian Splatting module. First, a single image isfed into a human video diffusion module to generate a coherent human video.Next, the video augmentation module applies super-resolution and videointerpolation to enhance the textures and geometric smoothness of the generatedvideo. Finally, the 3D Human Gaussian Splatting module learns lifelike humansunder the guidance of these high-resolution and view-consistent images.Experiments demonstrate that Human-VDM achieves high-quality 3D human from asingle image, outperforming state-of-the-art methods in both generation qualityand quantity. Project page: https://human-vdm.github.io/Human-VDM/