Decoupling Makes Weakly Supervised Local Feature Better

Weakly supervised learning can help local feature methods to overcome theobstacle of acquiring a large-scale dataset with densely labeledcorrespondences. However, since weak supervision cannot distinguish the lossescaused by the detection and description steps, directly conducting weaklysupervised learning within a joint describe-then-detect pipeline sufferslimited performance. In this paper, we propose a decoupled describe-then-detectpipeline tailored for weakly supervised local feature learning. Within ourpipeline, the detection step is decoupled from the description step andpostponed until discriminative and robust descriptors are learned. In addition,we introduce a line-to-window search strategy to explicitly use the camera poseinformation for better descriptor learning. Extensive experiments show that ourmethod, namely PoSFeat (Camera Pose Supervised Feature), outperforms previousfully and weakly supervised methods and achieves state-of-the-art performanceon a wide range of downstream tasks.