Image Captioning On Nocaps Val Overall
평가 지표
CIDEr
Pretrain (#images)
SPICE
평가 결과
이 벤치마크에서 각 모델의 성능 결과
비교 표
모델 이름 | CIDEr | Pretrain (#images) | SPICE |
---|---|---|---|
blip-bootstrapping-language-image-pre | 109.6 | 129M | 14.7 |
blip-bootstrapping-language-image-pre | 113.2 | 129M | 14.8 |
blip-2-bootstrapping-language-image-pre | 121.6 | 1.1B | 15.8 |
conceptual-12m-pushing-web-scale-image-text | 90.2 | - | 12.1 |
blip-2-bootstrapping-language-image-pre | 121.0 | 1.1B | 15.3 |
oscar-object-semantics-aligned-pre-training | 80.9 | 345M | 11.3 |
vinvl-making-visual-representations-matter-in | 95.5 | 5.7M | 13.5 |
blip-2-bootstrapping-language-image-pre | 119.7 | 1.1B | 15.4 |
simvlm-simple-visual-language-model | 112.2 | 1.8B | - |
omnivl-one-foundation-model-for-image | 107.5 | 14M | 14.7 |
scaling-up-vision-language-pre-training-for | 113.4 | 200M | 15.0 |