HyperAI
HyperAI超神経
ホーム
ニュース
最新論文
チュートリアル
データセット
百科事典
SOTA
LLMモデル
GPU ランキング
学会
検索
サイトについて
日本語
HyperAI
HyperAI超神経
Toggle sidebar
サイトを検索…
⌘
K
ホーム
SOTA
ゼロショットビデオ質問応答
Zero Shot Video Question Answer On Egoschema
Zero Shot Video Question Answer On Egoschema
評価指標
Accuracy
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
Columns
モデル名
Accuracy
Paper Title
Repository
VideoChat2_HD_mistral
65.6
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
-
MVU (13B)
60.3
Understanding Long Videos with Multimodal Language Models
-
Random
20.0
-
-
LangRepo (12B)
66.2
Language Repository for Long Video Understanding
-
LLoVi (7B)
50.8
A Simple LLM Framework for Long-Range Video Question-Answering
-
SlowFast-LLaVA-34B
47.2
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
-
LLoVi (GPT-3.5)
57.6
A Simple LLM Framework for Long-Range Video Question-Answering
-
Tarsier (34B)
68.6
Tarsier: Recipes for Training and Evaluating Large Video Description Models
-
SeViLA (4B)
25.7
Self-Chained Image-Language Model for Video Localization and Question Answering
-
LVNet
66.0
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
-
TS-LLaVA-34B
57.8
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models
-
VideoTree (GPT4)
66.2
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
-
VideoChat2_mistral
63.6
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
-
0 of 13 row(s) selected.
Previous
Next
Zero Shot Video Question Answer On Egoschema | SOTA | HyperAI超神経