HyperAI

Image Captioning On Nocaps Val In Domain

Metriken

CIDEr
Pre-train (#images)
SPICE

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Vergleichstabelle
ModellnameCIDErPre-train (#images)SPICE
scaling-up-vision-language-pre-training-for107.7200M14.7
blip-2-bootstrapping-language-image-pre123.71.1B15.8
vinvl-making-visual-representations-matter-in103.15.7M 14.2
blip-2-bootstrapping-language-image-pre123.71.1B16.3
blip-bootstrapping-language-image-pre114.9129M15.2
simvlm-simple-visual-language-model113.71.8B-
scaling-up-vision-language-pre-training-for116.9 200M15.8
blip-bootstrapping-language-image-pre111.8129M14.9
conceptual-12m-pushing-web-scale-image-text92.615M 12.5
omnivl-one-foundation-model-for-image104.614M15
blip-2-bootstrapping-language-image-pre1231.1B15.8