Video Captioning On Msvd 1
Metrics
BLEU-4
CIDEr
METEOR
ROUGE-L
Results
Performance results of various models on this benchmark
Comparison Table
Model Name | BLEU-4 | CIDEr | METEOR | ROUGE-L |
---|---|---|---|---|
icocap-improving-video-captioning-by | 56.3 | 103.8 | 38.9 | 75.0 |
rtq-rethinking-video-language-understanding | 66.9 | 123.4 | - | 82.2 |
valor-vision-audio-language-omni-perception | 80.7 | 178.5 | 51.0 | 87.9 |
cosa-concatenated-sample-pretrained-vision | 76.5 | 178.5 | - | - |
mplug-2-a-modularized-multi-modal-foundation | 70.5 | 165.8 | 48.4 | 85.3 |
accurate-and-fast-compressed-video-captioning | 60.1 | 121.5 | 41.4 | 78.2 |
mammut-a-simple-architecture-for-joint | - | 195.6 | - | - |
diverse-video-captioning-by-adaptive-spatio | 59.2 | 119.7 | 40.65 | 76.7 |
icocap-improving-video-captioning-by | 59.1 | 110.3 | 39.5 | 76.5 |
diverse-video-captioning-by-adaptive-spatio | 56.1 | 106.4 | 39.1 | 74.5 |
an-empirical-study-of-end-to-end-video | - | 139.2 | - | - |
vid2seq-large-scale-pretraining-of-a-visual | - | 146.2 | 45.3 | - |
vlab-enhancing-video-language-pre-training-by | 79.3 | 179.8 | 51.2 | 87.9 |
howtocaption-prompting-llms-to-transform | 70.4 | 154.2 | 46.4 | 83.2 |
sem-pos-grammatically-and-semantically | 60.1 | 108.3 | 38.5 | 76.0 |
hitea-hierarchical-temporal-aware-video | 71.0 | 146.9 | 45.3 | 81.4 |