Video Captioning On Activitynet Captions
المقاييس
BLEU4
CIDEr
METEOR
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
اسم النموذج | BLEU4 | CIDEr | METEOR | Paper Title | Repository |
---|---|---|---|---|---|
MART (ae-test split) - Appearance + Flow | 10.33 | 23.42 | 15.68 | MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning | |
COOT (ae-test split) - Only Appearance features | 10.85 | 28.19 | 15.99 | COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning | |
VLTinT (ae-test split) C3D/Ling | 14.5 | 31.13 | 17.97 | VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning | |
VideoCoCa | 14.7 | 39.3 | - | VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners | - |
VLCap (ae-test split) - Appearance + Language | 13.38 | 31.29 | 17.48 | VLCap: Vision-Language with Contrastive Learning for Coherent Video Paragraph Captioning |
0 of 5 row(s) selected.