HyperAI
HyperAI
Startseite
Neuigkeiten
Neueste Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Deutsch
HyperAI
HyperAI
Toggle sidebar
Seite durchsuchen…
⌘
K
Startseite
SOTA
Video-Fragebeantwortung
Video Question Answering On Msrvtt Qa
Video Question Answering On Msrvtt Qa
Metriken
Accuracy
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
Accuracy
Paper Title
Repository
FrozenBiLM
47.0
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
-
mPLUG-2
48.0
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
-
FrozenBiLM (0-shot)
16.7
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
-
VIOLETv2
44.5
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
-
Singularity-temporal
43.9
Revealing Single Frame Bias for Video-and-Language Learning
-
HBI
46.2
Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
-
VALOR
49.2
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
-
Singularity
43.5
Revealing Single Frame Bias for Video-and-Language Learning
-
VAST
50.1
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
-
Mirasol3B
50.42
Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities
-
VindLU
44.6
VindLU: A Recipe for Effective Video-and-Language Pretraining
-
COSA
49.2
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
-
MA-LMM
48.5
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
-
EMCL-Net
45.8
Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
-
0 of 14 row(s) selected.
Previous
Next
Video Question Answering On Msrvtt Qa | SOTA | HyperAI