HyperAI

Visual Question Answering On Msrvtt Qa 1

المقاييس

Accuracy

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

جدول المقارنة
اسم النموذجAccuracy
vid-tldr-training-free-token-merging-for0.470
unmasked-teacher-towards-training-efficient0.471
less-is-more-clipbert-for-video-and-language0.374
open-vocabulary-video-question-answering-a0.395
flamingo-a-visual-language-model-for-few-shot-10.310
open-vocabulary-video-question-answering-a0.470
all-in-one-exploring-unified-video-language0.443
video-text-as-game-players-hierarchical0.462
motion-appearance-co-memory-networks-for0.32
video-text-modeling-with-zero-shot-transfer0.463
tgif-qa-toward-spatio-temporal-reasoning-in0.309
clover-towards-a-unified-video-language0.441
align-and-prompt-video-and-language-pre0.421
video-question-answering-with-iterative-video.457
vlab-enhancing-video-language-pre-training-by0.496
heterogeneous-memory-enhanced-multimodal0.33
x-2-vlm-all-in-one-pre-trained-model-for0.45
sas-video-qa-self-adaptive-sampling-for0.438
open-vocabulary-video-question-answering-a0.418
lightweight-recurrent-cross-modal-encoder-for0.42
dualvgr-a-dual-visual-graph-reasoning-unit0.355
sas-video-qa-self-adaptive-sampling-for0.440
omnivl-one-foundation-model-for-image0.441
flamingo-a-visual-language-model-for-few-shot-10.174
multi-efficient-video-and-language0.478
mammut-a-simple-architecture-for-joint0.495
x-2-vlm-all-in-one-pre-trained-model-for0.455
sas-video-qa-self-adaptive-sampling-for0.423
hierarchical-conditional-relation-networks0.356
expectation-maximization-contrastive-learning0.458
internvideo-general-video-foundation-models0.471
hitea-hierarchical-temporal-aware-video0.459
mplug-2-a-modularized-multi-modal-foundation0.480
flamingo-a-visual-language-model-for-few-shot-10.474