HyperAI
HyperAI
Startseite
Plattform
Dokumentation
Neuigkeiten
Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Nutzungsbedingungen
Datenschutzrichtlinie
Deutsch
HyperAI
HyperAI
Toggle Sidebar
Seite durchsuchen…
⌘
K
Command Palette
Search for a command to run...
Plattform
Startseite
SOTA
Zeitliche Beziehungsextraktion
Temporal Relation Extraction On Vinoground
Temporal Relation Extraction On Vinoground
Metriken
Group Score
Text Score
Video Score
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
Group Score
Text Score
Video Score
Paper Title
GPT-4o (CoT)
35
59.2
51
-
GPT-4o
24.6
54
38.2
-
LLaVA-OneVision-Qwen2-72B
21.8
48.4
35.2
LLaVA-OneVision: Easy Visual Task Transfer
Qwen2-VL-72B
17.4
50.4
32.6
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Qwen2-VL-7B
15.2
40.2
32.4
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
LLaVA-OneVision-Qwen2-7B
14.6
41.6
29.4
LLaVA-OneVision: Easy Visual Task Transfer
Gemini-1.5-Pro (CoT)
12.4
37
27.6
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
MiniCPM-2.6
11.2
32.6
29.2
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Claude 3.5 Sonnet
10.6
32.8
28.8
-
Gemini-1.5-Pro
10.2
35.8
22.6
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
InternLM-XC-2.5
9.6
28.8
27.8
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
InternLM-XC-2.5 (CoT)
9
30.8
28.4
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
VideoLLaMA2-72B
8.4
36.2
21.8
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
LLaVA-NeXT-Video-7B (CoT)
6.8
21.8
26.2
-
MA-LMM-Vicuna-7B
6.8
23.8
25.6
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Video-LLaVA-7B
6.6
24.8
25.8
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
LLaVA-NeXT-Video-7B
6.2
21.8
25.6
-
Phi-3.5-Vision
6.2
24
22.4
-
VTimeLLM
5.2
19.4
27
VTimeLLM: Empower LLM to Grasp Video Moments
LLaVA-NeXT-Video-34B (CoT)
5.2
25.8
22.2
-
0 of 24 row(s) selected.
Previous
Next