HyperAI초신경

Zeroshot Video Question Answer On Activitynet

평가 지표

Accuracy
Confidence Score

평가 결과

이 벤치마크에서 각 모델의 성능 결과

비교 표
모델 이름AccuracyConfidence Score
moviechat-from-dense-token-to-sparse-memory45.73.1
one-for-all-video-conversation-is-feasible46.13.2
tarsier-recipes-for-training-and-evaluating-161.63.7
mvbench-a-comprehensive-multi-modal-video49.13.3
chat-univi-unified-visual-representation46.13.3
llama-vid-an-image-is-worth-2-tokens-in-large47.53.3
pllava-parameter-free-llava-extension-from-160.93.7
an-image-grid-can-be-worth-a-video-zero-shot58.43.5
slowfast-llava-a-strong-training-free59.23.5
llava-mini-efficient-image-and-video-large53.53.5
zero-shot-video-question-answering-via-frozen24.7-
videochat-chat-centric-video-understanding26.52.2
llama-vid-an-image-is-worth-2-tokens-in-large47.43.3
video-chatgpt-towards-detailed-video35.22.7
flash-vstream-memory-based-real-time51.93.4
video-llava-learning-united-visual-145.33.3
ts-llava-constructing-visual-tokens-through58.93.5
elysium-exploring-object-level-perception-in43.42.9
ppllava-varied-video-sequence-understanding60.73.6
linvt-empower-your-image-level-large-language60.13.6
chat-univi-unified-visual-representation46.43.6
video-llama-an-instruction-tuned-audio-visual12.41.1
video-lavit-unified-video-language-pre50.13.3
st-llm-large-language-models-are-effective-150.93.3
cat-enhancing-multimodal-large-language-model50.23.5
videogpt-integrating-image-and-video-encoders50.63.6
llama-adapter-v2-parameter-efficient-visual34.22.7
minigpt4-video-advancing-multimodal-llms-for46.3-