Video Question Answering On Situated
评估指标
Average Accuracy
评测结果
各个模型在此基准测试上的表现结果
比较表格
模型名称 | Average Accuracy |
---|---|
mist-multi-modal-iterative-spatial-temporal | 51.13 |
traveler-a-multi-lmm-agent-framework-for | 44.9 |
learning-situation-hyper-graphs-for-video | 39.47 |
flamingo-a-visual-language-model-for-few-shot-1 | 42.8 |
self-chained-image-language-model-for-video-1 | 64.9 |
all-in-one-exploring-unified-video-language | 47.5 |
glance-and-focus-memory-prompting-for-multi-1 | 53.94 |
vlap-efficient-video-language-alignment-via | 67.1 |
self-chained-image-language-model-for-video-1 | 44.6 |
flamingo-a-visual-language-model-for-few-shot-1 | 39.7 |
large-language-models-are-temporal-and-causal | 65.4 |
internvideo-general-video-foundation-models | 58.7 |
flamingo-a-visual-language-model-for-few-shot-1 | 41.8 |
revisiting-the-video-in-video-language | 48.37 |
anymal-an-efficient-and-scalable-any-modality | 48.2 |
flamingo-a-visual-language-model-for-few-shot-1 | 42.4 |
glance-and-focus-memory-prompting-for-multi-1 | 53.86 |