Multimodal Intent Recognition On Mintrec
Metrics
Accuracy (20 classes)
Accuracy (Binary)
Results
Performance results of various models on this benchmark
Model Name | Accuracy (20 classes) | Accuracy (Binary) | Paper Title | Repository |
---|---|---|---|---|
MulT (Text + Audio + Video) | 72.52 | 89.19 | MIntRec: A New Dataset for Multimodal Intent Recognition | |
SPECTRA | 73.48 | - | Speech-Text Dialog Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment | |
TCL-MAP | 73.62 | - | Token-Level Contrastive Learning with Modality-Aware Prompting for Multimodal Intent Recognition | |
Human | 85.51 | 94.72 | MIntRec: A New Dataset for Multimodal Intent Recognition | |
MAG-BERT (Text + Audio + Video) | 72.65 | 89.24 | MIntRec: A New Dataset for Multimodal Intent Recognition | |
MISA (Text + Audio + Video) | 72.29 | 89.21 | MIntRec: A New Dataset for Multimodal Intent Recognition |
0 of 6 row(s) selected.