Image Captioning On Nocaps Val In Domain
Metriken
CIDEr
Pre-train (#images)
SPICE
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Vergleichstabelle
Modellname | CIDEr | Pre-train (#images) | SPICE |
---|---|---|---|
scaling-up-vision-language-pre-training-for | 107.7 | 200M | 14.7 |
blip-2-bootstrapping-language-image-pre | 123.7 | 1.1B | 15.8 |
vinvl-making-visual-representations-matter-in | 103.1 | 5.7M | 14.2 |
blip-2-bootstrapping-language-image-pre | 123.7 | 1.1B | 16.3 |
blip-bootstrapping-language-image-pre | 114.9 | 129M | 15.2 |
simvlm-simple-visual-language-model | 113.7 | 1.8B | - |
scaling-up-vision-language-pre-training-for | 116.9 | 200M | 15.8 |
blip-bootstrapping-language-image-pre | 111.8 | 129M | 14.9 |
conceptual-12m-pushing-web-scale-image-text | 92.6 | 15M | 12.5 |
omnivl-one-foundation-model-for-image | 104.6 | 14M | 15 |
blip-2-bootstrapping-language-image-pre | 123 | 1.1B | 15.8 |