HyperAI

Image Captioning On Nocaps Val Overall

Métriques

CIDEr
Pretrain (#images)
SPICE

Résultats

Résultats de performance de divers modèles sur ce benchmark

Tableau comparatif
Nom du modèleCIDErPretrain (#images)SPICE
blip-bootstrapping-language-image-pre109.6129M14.7
blip-bootstrapping-language-image-pre 113.2129M14.8
blip-2-bootstrapping-language-image-pre121.61.1B15.8
conceptual-12m-pushing-web-scale-image-text90.2-12.1
blip-2-bootstrapping-language-image-pre121.01.1B15.3
oscar-object-semantics-aligned-pre-training80.9345M11.3
vinvl-making-visual-representations-matter-in 95.55.7M 13.5
blip-2-bootstrapping-language-image-pre119.71.1B15.4
simvlm-simple-visual-language-model112.21.8B-
omnivl-one-foundation-model-for-image107.514M14.7
scaling-up-vision-language-pre-training-for113.4200M 15.0