3DVNet: Multi-View Depth Prediction and Volumetric Refinement

We present 3DVNet, a novel multi-view stereo (MVS) depth-prediction methodthat combines the advantages of previous depth-based and volumetric MVSapproaches. Our key idea is the use of a 3D scene-modeling network thatiteratively updates a set of coarse depth predictions, resulting in highlyaccurate predictions which agree on the underlying scene geometry. Unlikeexisting depth-prediction techniques, our method uses a volumetric 3Dconvolutional neural network (CNN) that operates in world space on all depthmaps jointly. The network can therefore learn meaningful scene-level priors.Furthermore, unlike existing volumetric MVS techniques, our 3D CNN operates ona feature-augmented point cloud, allowing for effective aggregation ofmulti-view information and flexible iterative refinement of depth maps.Experimental results show our method exceeds state-of-the-art accuracy in bothdepth prediction and 3D reconstruction metrics on the ScanNet dataset, as wellas a selection of scenes from the TUM-RGBD and ICL-NUIM datasets. This showsthat our method is both effective and generalizes to new settings.