Visual Question Answering On Msvd Qa 1
Metriken
Accuracy
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Vergleichstabelle
Modellname | Accuracy |
---|---|
multi-efficient-video-and-language | 0.547 |
omnivl-one-foundation-model-for-image | 0.510 |
heterogeneous-memory-enhanced-multimodal | 0.337 |
dualvgr-a-dual-visual-graph-reasoning-unit | 0.390 |
vast-a-vision-audio-subtitle-text-omni-1 | 0.60 |
hierarchical-conditional-relation-networks | 0.361 |
clover-towards-a-unified-video-language | 0.524 |
all-in-one-exploring-unified-video-language | 0.483 |
sas-video-qa-self-adaptive-sampling-for | 0.467 |
internvideo-general-video-foundation-models | 0.555 |
video-text-modeling-with-zero-shot-transfer | 0.569 |
open-vocabulary-video-question-answering-a | 0.438 |
x-2-vlm-all-in-one-pre-trained-model-for | 0.528 |
tgif-qa-toward-spatio-temporal-reasoning-in | 0.313 |
mammut-a-simple-architecture-for-joint | .602 |
unmasked-teacher-towards-training-efficient | 0.552 |
lightweight-recurrent-cross-modal-encoder-for | 0.478 |
motion-appearance-co-memory-networks-for | 0.317 |
x-2-vlm-all-in-one-pre-trained-model-for | 0.546 |
meltr-meta-loss-transformer-for-learning-to | 0.517 |
hitea-hierarchical-temporal-aware-video | 0.556 |
open-vocabulary-video-question-answering-a | 0.558 |
cosa-concatenated-sample-pretrained-vision | 0.60 |
mplug-2-a-modularized-multi-modal-foundation | 0.581 |
vid-tldr-training-free-token-merging-for | 0.549 |
sas-video-qa-self-adaptive-sampling-for | 0.469 |
open-vocabulary-video-question-answering-a | 0.495 |
ma-lmm-memory-augmented-large-multimodal | 0.606 |
valor-vision-audio-language-omni-perception | 0.60 |
align-and-prompt-video-and-language-pre | 0.459 |
noise-estimation-using-density-estimation-for | 0.351 |
video-question-answering-with-iterative-video | .486 |
an-empirical-study-of-end-to-end-video | 0.547 |
vlab-enhancing-video-language-pre-training-by | 0.61 |
git-a-generative-image-to-text-transformer | 0.568 |
open-vocabulary-video-question-answering-a | 0.477 |