2 months ago

Tracing Intricate Cues in Dialogue: Joint Graph Structure and Sentiment Dynamics for Multimodal Emotion Recognition

Li, Jiang ; Wang, Xiaoping ; Zeng, Zhigang

Abstract

Multimodal emotion recognition in conversation (MERC) has garneredsubstantial research attention recently. Existing MERC methods face severalchallenges: (1) they fail to fully harness direct inter-modal cues, possiblyleading to less-than-thorough cross-modal modeling; (2) they concurrentlyextract information from the same and different modalities at each networklayer, potentially triggering conflicts from the fusion of multi-source data;(3) they lack the agility required to detect dynamic sentimental changes,perhaps resulting in inaccurate classification of utterances with abruptsentiment shifts. To address these issues, a novel approach named GraphSmile isproposed for tracking intricate emotional cues in multimodal dialogues.GraphSmile comprises two key components, i.e., GSF and SDP modules. GSFingeniously leverages graph structures to alternately assimilate inter-modaland intra-modal emotional dependencies layer by layer, adequately capturingcross-modal cues while effectively circumventing fusion conflicts. SDP is anauxiliary task to explicitly delineate the sentiment dynamics betweenutterances, promoting the model's ability to distinguish sentimentaldiscrepancies. Furthermore, GraphSmile is effortlessly applied to multimodalsentiment analysis in conversation (MSAC), forging a unified multimodalaffective model capable of executing MERC and MSAC tasks. Empirical results onmultiple benchmarks demonstrate that GraphSmile can handle complex emotionaland sentimental patterns, significantly outperforming baseline models.