HyperAI초신경

Video Question Answering On Situated

평가 지표

Average Accuracy

평가 결과

이 벤치마크에서 각 모델의 성능 결과

비교 표
모델 이름Average Accuracy
mist-multi-modal-iterative-spatial-temporal51.13
traveler-a-multi-lmm-agent-framework-for44.9
learning-situation-hyper-graphs-for-video39.47
flamingo-a-visual-language-model-for-few-shot-142.8
self-chained-image-language-model-for-video-164.9
all-in-one-exploring-unified-video-language47.5
glance-and-focus-memory-prompting-for-multi-153.94
vlap-efficient-video-language-alignment-via67.1
self-chained-image-language-model-for-video-144.6
flamingo-a-visual-language-model-for-few-shot-139.7
large-language-models-are-temporal-and-causal65.4
internvideo-general-video-foundation-models58.7
flamingo-a-visual-language-model-for-few-shot-141.8
revisiting-the-video-in-video-language48.37
anymal-an-efficient-and-scalable-any-modality48.2
flamingo-a-visual-language-model-for-few-shot-142.4
glance-and-focus-memory-prompting-for-multi-153.86