HyperAIHyperAI

Command Palette

Search for a command to run...

MambaPlace:Text-to-Point-Cloud Cross-Modal Place Recognition with Attention Mamba Mechanisms

Shang Tianyi ; Li Zhenyu ; Xu Pengjie ; Qiao Jinwei

Abstract

Vision Language Place Recognition (VLVPR) enhances robot localizationperformance by incorporating natural language descriptions from images. Byutilizing language information, VLVPR directs robot place matching, overcomingthe constraint of solely depending on vision. The essence of multimodal fusionlies in mining the complementary information between different modalities.However, general fusion methods rely on traditional neural architectures andare not well equipped to capture the dynamics of cross modal interactions,especially in the presence of complex intra modal and inter modal correlations.To this end, this paper proposes a novel coarse to fine and end to endconnected cross modal place recognition framework, called MambaPlace. In thecoarse localization stage, the text description and 3D point cloud are encodedby the pretrained T5 and instance encoder, respectively. They are thenprocessed using Text Attention Mamba (TAM) and Point Clouds Mamba (PCM) fordata enhancement and alignment. In the subsequent fine localization stage, thefeatures of the text description and 3D point cloud are cross modally fused andfurther enhanced through cascaded Cross Attention Mamba (CCAM). Finally, wepredict the positional offset from the fused text point cloud features,achieving the most accurate localization. Extensive experiments show thatMambaPlace achieves improved localization accuracy on the KITTI360Pose datasetcompared to the state of the art methods.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp