Image Captioning On Nocaps Out Of Domain
Metrics
CIDEr
SPICE
Results
Performance results of various models on this benchmark
Comparison Table
Model Name | CIDEr | SPICE |
---|---|---|
clipcap-clip-prefix-for-image-captioning | 49.14 | 9.57 |
Model 2 | 21.3 | 7.2 |
clipcap-clip-prefix-for-image-captioning | 49.35 | 9.7 |
Model 4 | 72.13 | 11.53 |
Model 5 | 30.09 | 8.08 |
Model 6 | 26.55 | 7.72 |
Model 7 | 58.48 | 8.77 |
Model 8 | 30.09 | 8.08 |
vivo-surpassing-human-performance-in-novel | 110.14 | 13.74 |
Model 10 | 71.43 | 10.57 |
Model 11 | 48.73 | 8.2 |
Model 12 | 88.54 | 13.87 |
Model 13 | 70.21 | 10.15 |
Model 14 | 103.75 | 13.75 |
Model 15 | 85.18 | 11.18 |
Model 16 | 68.92 | 10.05 |
Model 17 | 87.51 | 12.52 |
Model 18 | 77.39 | 11.59 |
Model 19 | 23.07 | 7.4 |
Model 20 | 68.5 | 10.01 |
Model 21 | 54.56 | 9.9 |
git-a-generative-image-to-text-transformer | 122.27 | 15.62 |
vinvl-making-visual-representations-matter-in | 78.01 | 11.48 |
simvlm-simple-visual-language-model | 109.49 | 13.89 |
Model 25 | 26.25 | 7.52 |
Model 26 | 91.62 | 14.21 |
Model 27 | 87.15 | 11.43 |
Model 28 | 121.69 | 15.13 |
Model 29 | 36.12 | 9.39 |
git-a-generative-image-to-text-transformer | 122.04 | 15.7 |
Model 31 | 39.39 | 7.62 |
Model 32 | 75.39 | 10.68 |
Model 33 | 66.67 | 9.74 |
Model 34 | 43.2 | 9.35 |
Model 35 | 78.91 | 12.14 |
Model 36 | 25.91 | 7.61 |
Model 37 | 73.75 | 9.72 |
pali-a-jointly-scaled-multilingual-language | 126.67 | 15.49 |
Model 39 | 106.55 | 14.21 |
grit-faster-and-better-image-captioning | 72.6 | 11.1 |