Global Proxy-based Hard Mining for Visual Place Recognition

Learning deep representations for visual place recognition is commonlyperformed using pairwise or triple loss functions that highly depend on thehardness of the examples sampled at each training iteration. Existingtechniques address this by using computationally and memory expensive offlinehard mining, which consists of identifying, at each iteration, the hardestsamples from the training set. In this paper we introduce a new technique thatperforms global hard mini-batch sampling based on proxies. To do so, we add anew end-to-end trainable branch to the network, which generates efficient placedescriptors (one proxy for each place). These proxy representations are thusused to construct a global index that encompasses the similarities between allplaces in the dataset, allowing for highly informative mini-batch sampling ateach training iteration. Our method can be used in combination with allexisting pairwise and triplet loss functions with negligible additional memoryand computation cost. We run extensive ablation studies and show that ourtechnique brings new state-of-the-art performance on multiple large-scalebenchmarks such as Pittsburgh, Mapillary-SLS and SPED. In particular, ourmethod provides more than 100% relative improvement on the challenging Nordlanddataset. Our code is available at https://github.com/amaralibey/GPM