Image Captioning On Nocaps Xd Out Of Domain
Metrics
B1
B2
B3
B4
CIDEr
METEOR
ROUGE-L
SPICE
Results
Performance results of various models on this benchmark
Model Name | B1 | B2 | B3 | B4 | CIDEr | METEOR | ROUGE-L | SPICE | Paper Title | Repository |
---|---|---|---|---|---|---|---|---|---|---|
UpDown | 66.54 | 44.28 | 24.23 | 10.17 | 30.09 | 18.29 | 44.84 | 8.08 | - | - |
icp2ssi1_coco_si_0.02_5_test | 75.59 | 56.71 | 35.63 | 17.72 | 85.28 | 23.77 | 51.92 | 11.28 | - | - |
UpDown + ELMo + CBS | 71.57 | 48.58 | 25.77 | 9.68 | 66.67 | 20.88 | 47.13 | 9.74 | - | - |
GIT2 | 86.28 | 71.15 | 52.36 | 30.15 | 122.27 | 30.15 | 60.91 | 15.62 | GIT: A Generative Image-to-text Transformer for Vision and Language | |
VLAF2 | 79.59 | 61.04 | 40.09 | 19.61 | 90.34 | 26.14 | 54.86 | 13.11 | - | - |
Human | 74.84 | 53.9 | 33.51 | 16.6 | 91.62 | 26.83 | 51.5 | 14.21 | - | - |
Microsoft Cognitive Services team | 79.44 | 61.15 | 41.03 | 21.79 | 95.5 | 26.56 | 55.49 | 12.66 | VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning | - |
Neural Baby Talk | 64.45 | 42.8 | 21.48 | 7.92 | 48.73 | 18.31 | 44.11 | 8.2 | - | - |
test_cbs2 | 74.5 | 53.63 | 30.91 | 13.41 | 77.94 | 23.47 | 49.66 | 11.07 | - | - |
Neural Baby Talk + CBS | 65.98 | 43.2 | 21.16 | 7.5 | 58.48 | 19.04 | 44.47 | 8.77 | - | - |
GIT | 85.99 | 71.28 | 52.66 | 30.04 | 122.04 | 30.45 | 60.96 | 15.7 | GIT: A Generative Image-to-text Transformer for Vision and Language |
0 of 11 row(s) selected.