Video Question Answering On Msrvtt Qa
Métriques
Accuracy
Résultats
Résultats de performance de divers modèles sur ce benchmark
Tableau comparatif
Nom du modèle | Accuracy |
---|---|
zero-shot-video-question-answering-via-frozen | 47.0 |
mplug-2-a-modularized-multi-modal-foundation | 48.0 |
zero-shot-video-question-answering-via-frozen | 16.7 |
an-empirical-study-of-end-to-end-video | 44.5 |
revealing-single-frame-bias-for-video-and | 43.9 |
video-text-as-game-players-hierarchical | 46.2 |
valor-vision-audio-language-omni-perception | 49.2 |
revealing-single-frame-bias-for-video-and | 43.5 |
vast-a-vision-audio-subtitle-text-omni-1 | 50.1 |
mirasol3b-a-multimodal-autoregressive-model | 50.42 |
vindlu-a-recipe-for-effective-video-and | 44.6 |
cosa-concatenated-sample-pretrained-vision | 49.2 |
ma-lmm-memory-augmented-large-multimodal | 48.5 |
expectation-maximization-contrastive-learning | 45.8 |