HyperAI超神経

Visual Question Answering On Msrvtt Qa 1

評価指標

Accuracy

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

モデル名
Accuracy
Paper TitleRepository
vid-TLDR (UMT-L)0.470vid-TLDR: Training Free Token merging for Light-weight Video Transformer
UMT-L (ViT-L/16)0.471Unmasked Teacher: Towards Training-Efficient Video Foundation Models
CLIPBERT0.374Less is More: ClipBERT for Video-and-Language Learning via Sparse Sampling
All-in-one+0.395Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models
Flamingo (32-shot)0.310Flamingo: a Visual Language Model for Few-Shot Learning
FrozenBiLM+0.470Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models
All-in-one-B0.443All in One: Exploring Unified Video-Language Pre-training
HBI0.462Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Co-Mem0.32Motion-Appearance Co-Memory Networks for Video Question Answering-
VideoCoCa0.463VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners-
ST-VQA0.309TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
Clover0.441Clover: Towards A Unified Video-Language Alignment and Fusion Model
ALPRO0.421Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Co-Tokenization.457Video Question Answering with Iterative Video-Text Co-Tokenization-
VLAB0.496VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending-
HMEMA0.33Heterogeneous Memory Enhanced Multimodal Attention Model for Video Question Answering
X2-VLM (base)0.45X$^2$-VLM: All-In-One Pre-trained Model For Vision-Language Tasks
AIO+MDF0.438Self-Adaptive Sampling for Efficient Video Question-Answering on Image--Text Models
JustAsk+0.418Open-vocabulary Video Question Answering: A New Benchmark for Evaluating the Generalizability of Video Question Answering Models
LRCE0.42Lightweight Recurrent Cross-modal Encoder for Video Question Answering
0 of 34 row(s) selected.