HyperAI
HyperAI초신경
홈
뉴스
최신 연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
한국어
HyperAI
HyperAI초신경
Toggle sidebar
전체 사이트 검색...
⌘
K
홈
SOTA
비디오 질문 답변
Video Question Answering On Msrvtt Qa
Video Question Answering On Msrvtt Qa
평가 지표
Accuracy
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
Accuracy
Paper Title
Repository
FrozenBiLM
47.0
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
mPLUG-2
48.0
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
FrozenBiLM (0-shot)
16.7
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
VIOLETv2
44.5
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Singularity-temporal
43.9
Revealing Single Frame Bias for Video-and-Language Learning
HBI
46.2
Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
VALOR
49.2
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
Singularity
43.5
Revealing Single Frame Bias for Video-and-Language Learning
VAST
50.1
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Mirasol3B
50.42
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
-
VindLU
44.6
VindLU: A Recipe for Effective Video-and-Language Pretraining
COSA
49.2
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
MA-LMM
48.5
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
EMCL-Net
45.8
Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
0 of 14 row(s) selected.
Previous
Next
Video Question Answering On Msrvtt Qa | SOTA | HyperAI초신경