HyperAI

Temporal Relation Extraction On Vinoground

Metrics

Group Score
Text Score
Video Score

Results

Performance results of various models on this benchmark

Comparison Table
Model NameGroup ScoreText ScoreVideo Score
Model 124.65438.2
Model 26.221.825.6
Model 36.821.826.2
Model 43.82321.2
qwen2-vl-enhancing-vision-language-model-s15.240.232.4
Model 63559.251
Model 76.22422.4
Model 810.632.828.8
imagebind-one-embedding-space-to-bind-them0.69.43.4
llava-onevision-easy-visual-task-transfer21.848.435.2
vtimellm-empower-llm-to-grasp-video-moments5.219.427
ma-lmm-memory-augmented-large-multimodal6.823.825.6
llava-onevision-easy-visual-task-transfer14.641.629.4
gemini-1-5-unlocking-multimodal-understanding12.43727.6
gemini-1-5-unlocking-multimodal-understanding10.235.822.6
internlm-xcomposer-2-5-a-versatile-large9.628.827.8
videoclip-contrastive-pre-training-for-zero1.2172.8
video-llava-learning-united-visual-16.624.825.8
languagebind-extending-video-language1.210.65
videollama-2-advancing-spatial-temporal8.436.221.8
internlm-xcomposer-2-5-a-versatile-large930.828.4
Model 225.225.822.2
qwen2-vl-enhancing-vision-language-model-s17.450.432.6
2408-0180011.232.629.2