Zero Shot Video Question Answer On Egoschema 1
評価指標
Accuracy
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
比較表
モデル名 | Accuracy |
---|---|
timechat-a-time-sensitive-multimodal-large | 33.0 |
mvbench-a-comprehensive-multi-modal-video | 54.4 |
tarsier-recipes-for-training-and-evaluating-1 | 61.7 |
a-simple-llm-framework-for-long-range-video | 50.3 |
internvideo-general-video-foundation-models | 32.1 |
videotree-adaptive-tree-based-video | 61.1 |
vamos-versatile-action-models-for-video | 53.6 |
understanding-long-videos-in-one-multimodal | 37.6 |
self-chained-image-language-model-for-video-1 | 22.7 |
too-many-frames-not-all-useful-efficient | 61.1 |
bimba-selective-scan-compression-for-long | 71.14 |
モデル 12 | 20.0 |
vamos-versatile-action-models-for-video | 48.3 |
mvbench-a-comprehensive-multi-modal-video | 55.8 |
videollama-2-advancing-spatial-temporal | 63.9 |
video-rag-visually-aligned-retrieval | 66.7 |
mplug-owl-modularization-empowers-large | 31.1 |
linvt-empower-your-image-level-large-language | 69.5 |
traveler-a-multi-lmm-agent-framework-for | 53.3 |
mvbench-a-comprehensive-multi-modal-video | 56.7 |
a-simple-llm-framework-for-long-range-video | 33.5 |
video-recap-recursive-captioning-of-hour-long | 50.23 |
zero-shot-video-question-answering-via-frozen | 26.9 |
vamos-versatile-action-models-for-video | 36.7 |
language-repository-for-long-video | 41.2 |
internvideo2-scaling-video-foundation-models | 60.2 |
longvu-spatiotemporal-adaptive-compression | 67.6 |