Temporal Relation Extraction On Vinoground
评估指标
Group Score
Text Score
Video Score
评测结果
各个模型在此基准测试上的表现结果
比较表格
模型名称 | Group Score | Text Score | Video Score |
---|---|---|---|
模型 1 | 24.6 | 54 | 38.2 |
模型 2 | 6.2 | 21.8 | 25.6 |
模型 3 | 6.8 | 21.8 | 26.2 |
模型 4 | 3.8 | 23 | 21.2 |
qwen2-vl-enhancing-vision-language-model-s | 15.2 | 40.2 | 32.4 |
模型 6 | 35 | 59.2 | 51 |
模型 7 | 6.2 | 24 | 22.4 |
模型 8 | 10.6 | 32.8 | 28.8 |
imagebind-one-embedding-space-to-bind-them | 0.6 | 9.4 | 3.4 |
llava-onevision-easy-visual-task-transfer | 21.8 | 48.4 | 35.2 |
vtimellm-empower-llm-to-grasp-video-moments | 5.2 | 19.4 | 27 |
ma-lmm-memory-augmented-large-multimodal | 6.8 | 23.8 | 25.6 |
llava-onevision-easy-visual-task-transfer | 14.6 | 41.6 | 29.4 |
gemini-1-5-unlocking-multimodal-understanding | 12.4 | 37 | 27.6 |
gemini-1-5-unlocking-multimodal-understanding | 10.2 | 35.8 | 22.6 |
internlm-xcomposer-2-5-a-versatile-large | 9.6 | 28.8 | 27.8 |
videoclip-contrastive-pre-training-for-zero | 1.2 | 17 | 2.8 |
video-llava-learning-united-visual-1 | 6.6 | 24.8 | 25.8 |
languagebind-extending-video-language | 1.2 | 10.6 | 5 |
videollama-2-advancing-spatial-temporal | 8.4 | 36.2 | 21.8 |
internlm-xcomposer-2-5-a-versatile-large | 9 | 30.8 | 28.4 |
模型 22 | 5.2 | 25.8 | 22.2 |
qwen2-vl-enhancing-vision-language-model-s | 17.4 | 50.4 | 32.6 |
2408-01800 | 11.2 | 32.6 | 29.2 |