HyperAI
HyperAI超神経
ホーム
ニュース
最新論文
チュートリアル
データセット
百科事典
SOTA
LLMモデル
GPU ランキング
学会
検索
サイトについて
日本語
HyperAI
HyperAI超神経
Toggle sidebar
サイトを検索…
⌘
K
ホーム
SOTA
ビデオ質問応答
Video Question Answering On Msrvtt Qa
Video Question Answering On Msrvtt Qa
評価指標
Accuracy
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
Columns
モデル名
Accuracy
Paper Title
Repository
FrozenBiLM
47.0
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
-
mPLUG-2
48.0
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
-
FrozenBiLM (0-shot)
16.7
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
-
VIOLETv2
44.5
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
-
Singularity-temporal
43.9
Revealing Single Frame Bias for Video-and-Language Learning
-
HBI
46.2
Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
-
VALOR
49.2
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
-
Singularity
43.5
Revealing Single Frame Bias for Video-and-Language Learning
-
VAST
50.1
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
-
Mirasol3B
50.42
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
-
VindLU
44.6
VindLU: A Recipe for Effective Video-and-Language Pretraining
-
COSA
49.2
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
-
MA-LMM
48.5
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
-
EMCL-Net
45.8
Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
-
0 of 14 row(s) selected.
Previous
Next
Video Question Answering On Msrvtt Qa | SOTA | HyperAI超神経