HyperAI超神经

Image Captioning On Nocaps Val Near Domain

评估指标

CIDEr
Pre-train (#images)
SPICE

评测结果

各个模型在此基准测试上的表现结果

比较表格
模型名称CIDErPre-train (#images)SPICE
blip-2-bootstrapping-language-image-pre119.21.1B15.3
omnivl-one-foundation-model-for-image108.314M14.9
vinvl-making-visual-representations-matter-in96.15.7M13.8
blip-2-bootstrapping-language-image-pre120.21.1B15.9
conceptual-12m-pushing-web-scale-image-text88.3-12.1
blip-bootstrapping-language-image-pre112.1129M 14.9
scaling-up-vision-language-pre-training-for113.3200M 15.1
simvlm-simple-visual-language-model110.91.8B-
blip-bootstrapping-language-image-pre108.6129M14.8
blip-2-bootstrapping-language-image-pre117.81.1B15.4