HyperAI
HyperAI
Startseite
Neuigkeiten
Neueste Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Deutsch
HyperAI
HyperAI
Toggle sidebar
Seite durchsuchen…
⌘
K
Startseite
SOTA
Video-Fragebeantwortung
Video Question Answering On Tvbench
Video Question Answering On Tvbench
Metriken
Average Accuracy
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
Average Accuracy
Paper Title
Repository
Aria
51.0
Aria: An Open Multimodal Native Mixture-of-Experts Model
-
PLLaVA-34B
42.3
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
-
mPLUG-Owl3
42.2
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
-
Tarsier-7B
46.9
Tarsier: Recipes for Training and Evaluating Large Video Description Models
-
LLaVA-Video 7B
45.6
Video Instruction Tuning With Synthetic Data
-
PLLaVA-7B
34.9
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
-
IXC-2.5 7B
51.6
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
-
Qwen2-VL-72B
52.7
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
-
Qwen2-VL-7B
43.8
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
-
LLaVA-Video 72B
50.0
Video Instruction Tuning With Synthetic Data
-
ST-LLM
35.7
ST-LLM: Large Language Models Are Effective Temporal Learners
-
GPT4o 8 frames
39.9
GPT-4o System Card
-
VideoLLaMA2 72B
48.4
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
-
PLLaVA-13B
36.4
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
-
VideoGPT+
41.7
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
-
VideoLLaMA2 7B
42.9
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
-
VideoLLaMA2.1
42.1
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
-
Tarsier2-7B
54.7
Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding
-
VideoChat2
35.0
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
-
Gemini 1.5 Pro
47.6
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
-
0 of 21 row(s) selected.
Previous
Next
Video Question Answering On Tvbench | SOTA | HyperAI