HyperAI
الرئيسية
الأخبار
أحدث الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
العربية
HyperAI
Toggle sidebar
البحث في الموقع...
⌘
K
الرئيسية
SOTA
Video Captioning
Video Captioning On Youcook2
Video Captioning On Youcook2
المقاييس
BLEU-4
CIDEr
METEOR
ROUGE-L
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
BLEU-4
CIDEr
METEOR
ROUGE-L
Paper Title
Repository
HowToCaption
8.8
116.4
15.9
37.3
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
VAST
18.2
1.99
-
-
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
VideoBERT + S3D
4.33
0.55
11.94
28.80
VideoBERT: A Joint Model for Video and Language Representation Learning
MA-LMM
-
1.31
17.6
-
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
OmniVL
8.72
1.16
14.83
36.09
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
-
COSA
10.1
1.31
-
-
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
UniVL + MELTR
17.92
1.90
22.56
47.04
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
UniVL
17.35
1.81
22.35
46.52
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
E2vidD6-MASSvid-BiD
12.04
1.22
18.32
39.03
Multimodal Pretraining for Dense Video Captioning
TextKG
11.7
1.33
14.8
40.2
Text with Knowledge Graph Augmented Transformer for Video Captioning
-
VLM
12.27
1.3869
18.22
41.51
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
VideoCoCa
14.2
1.28
-
37.7
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
-
Zhou
4.38
0.38
11.55
27.44
End-to-End Dense Video Captioning with Masked Transformer
COOT
11.30
0.57
19.85
37.94
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
0 of 14 row(s) selected.
Previous
Next