HyperAI

Temporal Relation Extraction On Vinoground

Metriken

Group Score
Text Score
Video Score

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Vergleichstabelle
ModellnameGroup ScoreText ScoreVideo Score
Modell 124.65438.2
Modell 26.221.825.6
Modell 36.821.826.2
Modell 43.82321.2
qwen2-vl-enhancing-vision-language-model-s15.240.232.4
Modell 63559.251
Modell 76.22422.4
Modell 810.632.828.8
imagebind-one-embedding-space-to-bind-them0.69.43.4
llava-onevision-easy-visual-task-transfer21.848.435.2
vtimellm-empower-llm-to-grasp-video-moments5.219.427
ma-lmm-memory-augmented-large-multimodal6.823.825.6
llava-onevision-easy-visual-task-transfer14.641.629.4
gemini-1-5-unlocking-multimodal-understanding12.43727.6
gemini-1-5-unlocking-multimodal-understanding10.235.822.6
internlm-xcomposer-2-5-a-versatile-large9.628.827.8
videoclip-contrastive-pre-training-for-zero1.2172.8
video-llava-learning-united-visual-16.624.825.8
languagebind-extending-video-language1.210.65
videollama-2-advancing-spatial-temporal8.436.221.8
internlm-xcomposer-2-5-a-versatile-large930.828.4
Modell 225.225.822.2
qwen2-vl-enhancing-vision-language-model-s17.450.432.6
2408-0180011.232.629.2