HyperAI
Startseite
Neuigkeiten
Neueste Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Deutsch
HyperAI
Toggle sidebar
Seite durchsuchen…
⌘
K
Startseite
SOTA
Video Captioning
Video Captioning On Youcook2
Video Captioning On Youcook2
Metriken
BLEU-4
CIDEr
METEOR
ROUGE-L
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
BLEU-4
CIDEr
METEOR
ROUGE-L
Paper Title
Repository
HowToCaption
8.8
116.4
15.9
37.3
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
VAST
18.2
1.99
-
-
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
VideoBERT + S3D
4.33
0.55
11.94
28.80
VideoBERT: A Joint Model for Video and Language Representation Learning
MA-LMM
-
1.31
17.6
-
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
OmniVL
8.72
1.16
14.83
36.09
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
-
COSA
10.1
1.31
-
-
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
UniVL + MELTR
17.92
1.90
22.56
47.04
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
UniVL
17.35
1.81
22.35
46.52
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
E2vidD6-MASSvid-BiD
12.04
1.22
18.32
39.03
Multimodal Pretraining for Dense Video Captioning
TextKG
11.7
1.33
14.8
40.2
Text with Knowledge Graph Augmented Transformer for Video Captioning
-
VLM
12.27
1.3869
18.22
41.51
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
VideoCoCa
14.2
1.28
-
37.7
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
-
Zhou
4.38
0.38
11.55
27.44
End-to-End Dense Video Captioning with Masked Transformer
COOT
11.30
0.57
19.85
37.94
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
0 of 14 row(s) selected.
Previous
Next