HyperAI
Startseite
Neuigkeiten
Neueste Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Deutsch
HyperAI
Toggle sidebar
Seite durchsuchen…
⌘
K
Startseite
SOTA
Temporal Relation Extraction
Temporal Relation Extraction On Vinoground
Temporal Relation Extraction On Vinoground
Metriken
Group Score
Text Score
Video Score
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
Group Score
Text Score
Video Score
Paper Title
Repository
GPT-4o
24.6
54
38.2
-
-
LLaVA-NeXT-Video-7B
6.2
21.8
25.6
-
-
LLaVA-NeXT-Video-7B (CoT)
6.8
21.8
26.2
-
-
LLaVA-NeXT-Video-34B
3.8
23
21.2
-
-
Qwen2-VL-7B
15.2
40.2
32.4
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
GPT-4o (CoT)
35
59.2
51
-
-
Phi-3.5-Vision
6.2
24
22.4
-
-
Claude 3.5 Sonnet
10.6
32.8
28.8
-
-
ImageBind
0.6
9.4
3.4
ImageBind: One Embedding Space To Bind Them All
LLaVA-OneVision-Qwen2-72B
21.8
48.4
35.2
LLaVA-OneVision: Easy Visual Task Transfer
VTimeLLM
5.2
19.4
27
VTimeLLM: Empower LLM to Grasp Video Moments
MA-LMM-Vicuna-7B
6.8
23.8
25.6
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
LLaVA-OneVision-Qwen2-7B
14.6
41.6
29.4
LLaVA-OneVision: Easy Visual Task Transfer
Gemini-1.5-Pro (CoT)
12.4
37
27.6
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Gemini-1.5-Pro
10.2
35.8
22.6
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
InternLM-XC-2.5
9.6
28.8
27.8
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
VideoCLIP
1.2
17
2.8
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Video-LLaVA-7B
6.6
24.8
25.8
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
LanguageBind
1.2
10.6
5
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
VideoLLaMA2-72B
8.4
36.2
21.8
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
0 of 24 row(s) selected.
Previous
Next