HyperAIHyperAI
15 days ago

Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance

Zhao, Jiayi, Teng, Fei, Luo, Kai, Zhao, Guoqiang, Li, Zhiyong, Zheng, Xu, Yang, Kailun
Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal
  Semantic Segmentation with Language Guidance
Abstract

The perception capability of robotic systems relies on the richness of thedataset. Although Segment Anything Model 2 (SAM2), trained on large datasets,demonstrates strong perception potential in perception tasks, its inherenttraining paradigm prevents it from being suitable for RGB-T tasks. To addressthese challenges, we propose SHIFNet, a novel SAM2-driven Hybrid InteractionParadigm that unlocks the potential of SAM2 with linguistic guidance forefficient RGB-Thermal perception. Our framework consists of two key components:(1) Semantic-Aware Cross-modal Fusion (SACF) module that dynamically balancesmodality contributions through text-guided affinity learning, overcoming SAM2'sinherent RGB bias; (2) Heterogeneous Prompting Decoder (HPD) that enhancesglobal semantic information through a semantic enhancement module and thencombined with category embeddings to amplify cross-modal semantic consistency.With 32.27M trainable parameters, SHIFNet achieves state-of-the-artsegmentation performance on public benchmarks, reaching 89.8% on PST900 and67.8% on FMB, respectively. The framework facilitates the adaptation ofpre-trained large models to RGB-T segmentation tasks, effectively mitigatingthe high costs associated with data collection while endowing robotic systemswith comprehensive perception capabilities. The source code will be madepublicly available at https://github.com/iAsakiT3T/SHIFNet.

Unveiling the Potential of Segment Anything Model 2 for RGB-Thermal Semantic Segmentation with Language Guidance | Latest Papers | HyperAI