HyperAI超神経

Video Question Answering On Situated

評価指標

Average Accuracy

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

比較表
モデル名Average Accuracy
mist-multi-modal-iterative-spatial-temporal51.13
traveler-a-multi-lmm-agent-framework-for44.9
learning-situation-hyper-graphs-for-video39.47
flamingo-a-visual-language-model-for-few-shot-142.8
self-chained-image-language-model-for-video-164.9
all-in-one-exploring-unified-video-language47.5
glance-and-focus-memory-prompting-for-multi-153.94
vlap-efficient-video-language-alignment-via67.1
self-chained-image-language-model-for-video-144.6
flamingo-a-visual-language-model-for-few-shot-139.7
large-language-models-are-temporal-and-causal65.4
internvideo-general-video-foundation-models58.7
flamingo-a-visual-language-model-for-few-shot-141.8
revisiting-the-video-in-video-language48.37
anymal-an-efficient-and-scalable-any-modality48.2
flamingo-a-visual-language-model-for-few-shot-142.4
glance-and-focus-memory-prompting-for-multi-153.86