DELTAS: Depth Estimation by Learning Triangulation And densification of Sparse points
Multi-view stereo (MVS) is the golden mean between the accuracy of activedepth sensing and the practicality of monocular depth estimation. Cost volumebased approaches employing 3D convolutional neural networks (CNNs) haveconsiderably improved the accuracy of MVS systems. However, this accuracy comesat a high computational cost which impedes practical adoption. Distinct fromcost volume approaches, we propose an efficient depth estimation approach byfirst (a) detecting and evaluating descriptors for interest points, then (b)learning to match and triangulate a small set of interest points, and finally(c) densifying this sparse set of 3D points using CNNs. An end-to-end networkefficiently performs all three steps within a deep learning framework andtrained with intermediate 2D image and 3D geometric supervision, along withdepth supervision. Crucially, our first step complements pose estimation usinginterest point detection and descriptor learning. We demonstratestate-of-the-art results on depth estimation with lower compute for differentscene lengths. Furthermore, our method generalizes to newer environments andthe descriptors output by our network compare favorably to strong baselines.Code is available at https://github.com/magicleap/DELTAS