HyperAI
HyperAI超神経
ホーム
ニュース
最新論文
チュートリアル
データセット
百科事典
SOTA
LLMモデル
GPU ランキング
学会
検索
サイトについて
日本語
HyperAI
HyperAI超神経
Toggle sidebar
サイトを検索…
⌘
K
ホーム
SOTA
ビデオ質問応答
Video Question Answering On Tvbench
Video Question Answering On Tvbench
評価指標
Average Accuracy
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
Columns
モデル名
Average Accuracy
Paper Title
Repository
Aria
51.0
Aria: An Open Multimodal Native Mixture-of-Experts Model
-
PLLaVA-34B
42.3
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
-
mPLUG-Owl3
42.2
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
-
Tarsier-7B
46.9
Tarsier: Recipes for Training and Evaluating Large Video Description Models
-
LLaVA-Video 7B
45.6
Video Instruction Tuning With Synthetic Data
-
PLLaVA-7B
34.9
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
-
IXC-2.5 7B
51.6
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
-
Qwen2-VL-72B
52.7
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
-
Qwen2-VL-7B
43.8
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
-
LLaVA-Video 72B
50.0
Video Instruction Tuning With Synthetic Data
-
ST-LLM
35.7
ST-LLM: Large Language Models Are Effective Temporal Learners
-
GPT4o 8 frames
39.9
GPT-4o System Card
-
VideoLLaMA2 72B
48.4
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
-
PLLaVA-13B
36.4
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
-
VideoGPT+
41.7
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
-
VideoLLaMA2 7B
42.9
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
-
VideoLLaMA2.1
42.1
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
-
Tarsier2-7B
54.7
Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding
-
VideoChat2
35.0
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
-
Gemini 1.5 Pro
47.6
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
-
0 of 21 row(s) selected.
Previous
Next
Video Question Answering On Tvbench | SOTA | HyperAI超神経