Multimodal Emotion Recognition

Multimodal Emotion Recognition is a task aimed at accurately identifying human emotions by integrating information from multiple modalities such as acoustic (A), textual (T), and visual (V). The goal of this task is to enhance the precision and robustness of emotion analysis to better adapt to complex and varied real-world application scenarios. On the IEMOCAP dataset, all models must use the standard five-category emotion classification and be evaluated using the leave-one-session-out (LOSO) method. This technology has significant application value in fields such as human-computer interaction, mental health monitoring, and intelligent customer service.

IEMOCAP-4

PATHOSnet v2

CMU-MOSEI-Sentiment-3

GraphSmile

Expressive hands and faces dataset (EHF).

SMPLify-X

MELD-Sentiment

Command Palette

Multimodal Emotion Recognition