HyperAI

Zeroshot Video Question Answer On Msvd Qa

المقاييس

Accuracy
Confidence Score

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

جدول المقارنة
اسم النموذجAccuracyConfidence Score
one-for-all-video-conversation-is-feasible67.03.6
videogpt-integrating-image-and-video-encoders72.43.6
slowfast-llava-a-strong-training-free79.94.1
video-chatgpt-towards-detailed-video64.93.3
ppllava-varied-video-sequence-understanding77.14.0
elysium-exploring-object-level-perception-in75.83.7
one-for-all-video-conversation-is-feasible67.03.6
videochat-chat-centric-video-understanding56.32.8
flash-vstream-memory-based-real-time80.33.9
vila-on-pre-training-for-visual-language80.1-
llava-mini-efficient-image-and-video-large70.94.0
ts-llava-constructing-visual-tokens-through79.44.1
an-image-grid-can-be-worth-a-video-zero-shot79.64.1
llama-vid-an-image-is-worth-2-tokens-in-large69.73.7
mvbench-a-comprehensive-multi-modal-video70.03.9
video-llama-an-instruction-tuned-audio-visual51.62.5
linvt-empower-your-image-level-large-language80.24.4
pllava-parameter-free-llava-extension-from-179.94.2
llama-adapter-v2-parameter-efficient-visual54.93.1
zero-shot-video-question-answering-via-frozen33.8-
st-llm-large-language-models-are-effective-174.63.9
video-llava-learning-united-visual-170.73.9
tarsier-recipes-for-training-and-evaluating-180.34.2
video-lavit-unified-video-language-pre73.23.9
moviechat-from-dense-token-to-sparse-memory75.22.9
llama-vid-an-image-is-worth-2-tokens-in-large70.03.7
minigpt4-video-advancing-multimodal-llms-for73.92-
chat-univi-unified-visual-representation69.33.7