Video Captioning On Msr Vtt 1
Métriques
BLEU-4
CIDEr
METEOR
ROUGE-L
Résultats
Résultats de performance de divers modèles sur ce benchmark
Tableau comparatif
Nom du modèle | BLEU-4 | CIDEr | METEOR | ROUGE-L |
---|---|---|---|---|
howtocaption-prompting-llms-to-transform | 49.8 | 65.3 | 32.2 | 66.3 |
mplug-2-a-modularized-multi-modal-foundation | 57.8 | 80.0 | 34.9 | 70.1 |
vid2seq-large-scale-pretraining-of-a-visual | - | 64.6 | 30.8 | - |
video-text-modeling-with-zero-shot-transfer | 53.8 | 73.2 | - | 68.0 |
meltr-meta-loss-transformer-for-learning-to | 44.17 | 52.77 | 29.26 | 62.35 |
vast-a-vision-audio-subtitle-text-omni-1 | 56.7 | 78.0 | - | - |
hitea-hierarchical-temporal-aware-video | 49.2 | 65.1 | 30.7 | 65.0 |
text-with-knowledge-graph-augmented | 46.6 | 60.8 | 30.5 | 64.8 |
cosa-concatenated-sample-pretrained-vision | 53.7 | 74.7 | - | - |
an-empirical-study-of-end-to-end-video | - | 58 | - | - |
accurate-and-fast-compressed-video-captioning | 44.4 | 57.2 | 30.3 | 63.4 |
icocap-improving-video-captioning-by | 47.0 | 60.2 | 31.1 | 64.9 |
vlab-enhancing-video-language-pre-training-by | 54.6 | 74.9 | 33.4 | 68.3 |
sem-pos-grammatically-and-semantically | 45.2 | 53.1 | 30.7 | 64.1 |
clip-meets-video-captioners-attribute-aware | 48.2 | 58.7 | 31.3 | 64.8 |
icocap-improving-video-captioning-by | 46.1 | 59.1 | 30.3 | 64.3 |
git-a-generative-image-to-text-transformer | 54.8 | 75.9 | 33.1 | 68.2 |
mammut-a-simple-architecture-for-joint | - | 73.6 | - | - |
rtq-rethinking-video-language-understanding | 49.6 | 69.3 | - | 66.1 |
expectation-maximization-contrastive-learning | 45.3 | 54.6 | 30.2 | 63.2 |
valor-vision-audio-language-omni-perception | 54.4 | 74.0 | 32.9 | 68.0 |
end-to-end-generative-pretraining-for | 48.9 | 60.0 | 38.7 | 64.0 |
diverse-video-captioning-by-adaptive-spatio | 44.21 | 56.08 | 30.24 | 62.9 |
diverse-video-captioning-by-adaptive-spatio | 43.4 | 55 | 30.2 | 62.5 |