HyperAI

Image Captioning On Nocaps Val Near Domain

Metriken

CIDEr
Pre-train (#images)
SPICE

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Vergleichstabelle
ModellnameCIDErPre-train (#images)SPICE
blip-2-bootstrapping-language-image-pre119.21.1B15.3
omnivl-one-foundation-model-for-image108.314M14.9
vinvl-making-visual-representations-matter-in96.15.7M13.8
blip-2-bootstrapping-language-image-pre120.21.1B15.9
conceptual-12m-pushing-web-scale-image-text88.3-12.1
blip-bootstrapping-language-image-pre112.1129M 14.9
scaling-up-vision-language-pre-training-for113.3200M 15.1
simvlm-simple-visual-language-model110.91.8B-
blip-bootstrapping-language-image-pre108.6129M14.8
blip-2-bootstrapping-language-image-pre117.81.1B15.4