8 months ago

Abstract

Remarkable progress has been made in 3D reconstruction from single-view RGB-Dinputs. MCC is the current state-of-the-art method in this field, whichachieves unprecedented success by combining vision Transformers withlarge-scale training. However, we identified two key limitations of MCC: 1) TheTransformer decoder is inefficient in handling large number of query points; 2)The 3D representation struggles to recover high-fidelity details. In thispaper, we propose a new approach called NU-MCC that addresses theselimitations. NU-MCC includes two key innovations: a Neighborhood decoder and aRepulsive Unsigned Distance Function (Repulsive UDF). First, our Neighborhooddecoder introduces center points as an efficient proxy of input visualfeatures, allowing each query point to only attend to a small neighborhood.This design not only results in much faster inference speed but also enablesthe exploitation of finer-scale visual features for improved recovery of 3Dtextures. Second, our Repulsive UDF is a novel alternative to the occupancyfield used in MCC, significantly improving the quality of 3D objectreconstruction. Compared to standard UDFs that suffer from holes in results,our proposed Repulsive UDF can achieve more complete surface reconstruction.Experimental results demonstrate that NU-MCC is able to learn a strong 3Drepresentation, significantly advancing the state of the art in single-view 3Dreconstruction. Particularly, it outperforms MCC by 9.7% in terms of theF1-score on the CO3D-v2 dataset with more than 5x faster running speed.

Source PDF View Code