Image Captioning On Nocaps Val Near Domain
المقاييس
CIDEr
Pre-train (#images)
SPICE
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
جدول المقارنة
اسم النموذج | CIDEr | Pre-train (#images) | SPICE |
---|---|---|---|
blip-2-bootstrapping-language-image-pre | 119.2 | 1.1B | 15.3 |
omnivl-one-foundation-model-for-image | 108.3 | 14M | 14.9 |
vinvl-making-visual-representations-matter-in | 96.1 | 5.7M | 13.8 |
blip-2-bootstrapping-language-image-pre | 120.2 | 1.1B | 15.9 |
conceptual-12m-pushing-web-scale-image-text | 88.3 | - | 12.1 |
blip-bootstrapping-language-image-pre | 112.1 | 129M | 14.9 |
scaling-up-vision-language-pre-training-for | 113.3 | 200M | 15.1 |
simvlm-simple-visual-language-model | 110.9 | 1.8B | - |
blip-bootstrapping-language-image-pre | 108.6 | 129M | 14.8 |
blip-2-bootstrapping-language-image-pre | 117.8 | 1.1B | 15.4 |