HyperAIHyperAI
2 months ago

MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition

Komorowski, Jacek ; Wysoczanska, Monika ; Trzcinski, Tomasz
MinkLoc++: Lidar and Monocular Image Fusion for Place Recognition
Abstract

We introduce a discriminative multimodal descriptor based on a pair of sensorreadings: a point cloud from a LiDAR and an image from an RGB camera. Ourdescriptor, named MinkLoc++, can be used for place recognition, re-localizationand loop closure purposes in robotics or autonomous vehicles applications. Weuse late fusion approach, where each modality is processed separately and fusedin the final part of the processing pipeline. The proposed method achievesstate-of-the-art performance on standard place recognition benchmarks. We alsoidentify dominating modality problem when training a multimodal descriptor. Theproblem manifests itself when the network focuses on a modality with a largeroverfit to the training data. This drives the loss down during the training butleads to suboptimal performance on the evaluation set. In this work we describehow to detect and mitigate such risk when using a deep metric learning approachto train a multimodal neural network. Our code is publicly available on theproject website: https://github.com/jac99/MinkLocMultimodal.