Video Captioning On Vatex 1
평가 지표
BLEU-4
CIDEr
평가 결과
이 벤치마크에서 각 모델의 성능 결과
모델 이름 | BLEU-4 | CIDEr | Paper Title | Repository |
---|---|---|---|---|
VAST | 45.0 | 99.5 | VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset | |
NITS-VC | 20.0 | 24.0 | NITS-VC System for VATEX Video Captioning Challenge 2020 | - |
IcoCap (ViT-B/32) | 36.9 | 63.4 | IcoCap: Improving Video Captioning by Compounding Images | - |
VideoCoCa | 39.7 | 77.8 | VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners | - |
CoCap (ViT/L14) | 35.8 | 64.8 | Accurate and Fast Compressed Video Captioning | |
VASTA (Kinetics-backbone) | 36.25 | 65.07 | Diverse Video Captioning by Adaptive Spatio-temporal Attention | |
ORG-TRL | 32.1 | 49.7 | Object Relational Graph with Teacher-Recommended Learning for Video Captioning | - |
COSA | 43.7 | 96.5 | COSA: Concatenated Sample Pretrained Vision-Language Foundation Model | |
VALOR | 45.6 | 95.8 | VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset | |
IcoCap (ViT-B/16) | 37.4 | 67.8 | IcoCap: Improving Video Captioning by Compounding Images | - |
0 of 10 row(s) selected.