8 months ago

Computer Vision

Multimodal Representation

Computer Vision

Jacek Komorowski Monika Wysoczańska Tomasz Trzcinski

Abstract

We introduce a discriminative multimodal descriptor based on a pair of sensorreadings: a point cloud from a LiDAR and an image from an RGB camera. Ourdescriptor, named MinkLoc++, can be used for place recognition, re-localizationand loop closure purposes in robotics or autonomous vehicles applications. Weuse late fusion approach, where each modality is processed separately and fusedin the final part of the processing pipeline. The proposed method achievesstate-of-the-art performance on standard place recognition benchmarks. We alsoidentify dominating modality problem when training a multimodal descriptor. Theproblem manifests itself when the network focuses on a modality with a largeroverfit to the training data. This drives the loss down during the training butleads to suboptimal performance on the evaluation set. In this work we describehow to detect and mitigate such risk when using a deep metric learning approachto train a multimodal neural network. Our code is publicly available on theproject website: https://github.com/jac99/MinkLocMultimodal.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp

8 months ago

Computer Vision

Multimodal Representation

Computer Vision

Jacek Komorowski Monika Wysoczańska Tomasz Trzcinski

Abstract

We introduce a discriminative multimodal descriptor based on a pair of sensorreadings: a point cloud from a LiDAR and an image from an RGB camera. Ourdescriptor, named MinkLoc++, can be used for place recognition, re-localizationand loop closure purposes in robotics or autonomous vehicles applications. Weuse late fusion approach, where each modality is processed separately and fusedin the final part of the processing pipeline. The proposed method achievesstate-of-the-art performance on standard place recognition benchmarks. We alsoidentify dominating modality problem when training a multimodal descriptor. Theproblem manifests itself when the network focuses on a modality with a largeroverfit to the training data. This drives the loss down during the training butleads to suboptimal performance on the evaluation set. In this work we describehow to detect and mitigate such risk when using a deep metric learning approachto train a multimodal neural network. Our code is publicly available on theproject website: https://github.com/jac99/MinkLocMultimodal.

Source PDF View Code

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started View Pricing

HyperAI Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Powered by MailChimp