Video Question Answering On Msrvtt Mc
Métriques
Accuracy
Résultats
Résultats de performance de divers modèles sur ce benchmark
Nom du modèle | Accuracy | Paper Title | Repository |
---|---|---|---|
Singularity-temporal | 93.7 | Revealing Single Frame Bias for Video-and-Language Learning | - |
Norton | 92.7 | Multi-granularity Correspondence Learning from Long-term Noisy Videos | - |
HiTeA | 97.4 | HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training | - |
VindLU | 95.5 | VindLU: A Recipe for Effective Video-and-Language Pretraining | - |
VIOLETv2 | 97.6 | An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling | - |
Singularity | 92.1 | Revealing Single Frame Bias for Video-and-Language Learning | - |
Clover | 95.2 | Clover: Towards A Unified Video-Language Alignment and Fusion Model | - |
0 of 7 row(s) selected.