D2-Net: A Trainable CNN for Joint Detection and Description of Local Features

In this work we address the problem of finding reliable pixel-levelcorrespondences under difficult imaging conditions. We propose an approachwhere a single convolutional neural network plays a dual role: It issimultaneously a dense feature descriptor and a feature detector. By postponingthe detection to a later stage, the obtained keypoints are more stable thantheir traditional counterparts based on early detection of low-levelstructures. We show that this model can be trained using pixel correspondencesextracted from readily available large-scale SfM reconstructions, without anyfurther annotations. The proposed method obtains state-of-the-art performanceon both the difficult Aachen Day-Night localization dataset and the InLocindoor localization benchmark, as well as competitive performance on otherbenchmarks for image matching and 3D reconstruction.