SIFU: Side-view Conditioned Implicit Function for Real-world Usable Clothed Human Reconstruction

Creating high-quality 3D models of clothed humans from single images forreal-world applications is crucial. Despite recent advancements, accuratelyreconstructing humans in complex poses or with loose clothing from in-the-wildimages, along with predicting textures for unseen areas, remains a significantchallenge. A key limitation of previous methods is their insufficient priorguidance in transitioning from 2D to 3D and in texture prediction. In response,we introduce SIFU (Side-view Conditioned Implicit Function for Real-worldUsable Clothed Human Reconstruction), a novel approach combining a Side-viewDecoupling Transformer with a 3D Consistent Texture Refinement pipeline.SIFUemploys a cross-attention mechanism within the transformer, using SMPL-Xnormals as queries to effectively decouple side-view features in the process ofmapping 2D features to 3D. This method not only improves the precision of the3D models but also their robustness, especially when SMPL-X estimates are notperfect. Our texture refinement process leverages text-to-image diffusion-basedprior to generate realistic and consistent textures for invisible views.Through extensive experiments, SIFU surpasses SOTA methods in both geometry andtexture reconstruction, showcasing enhanced robustness in complex scenarios andachieving an unprecedented Chamfer and P2S measurement. Our approach extends topractical applications such as 3D printing and scene building, demonstratingits broad utility in real-world scenarios. Project pagehttps://river-zhang.github.io/SIFU-projectpage/ .