Image Captioning On Nocaps Xd Entire
Metriken
B1
B2
B3
B4
CIDEr
METEOR
ROUGE-L
SPICE
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Modellname | B1 | B2 | B3 | B4 | CIDEr | METEOR | ROUGE-L | SPICE | Paper Title | Repository |
---|---|---|---|---|---|---|---|---|---|---|
Neural Baby Talk | 72.33 | 52.42 | 30.83 | 14.73 | 53.36 | 21.52 | 48.87 | 9.15 | - | - |
Microsoft Cognitive Services team | 85.62 | 71.36 | 53.62 | 34.65 | 114.25 | 31.27 | 61.2 | 14.85 | Scaling Up Vision-Language Pre-training for Image Captioning | - |
Human | 76.64 | 56.46 | 36.37 | 19.48 | 85.34 | 28.15 | 52.83 | 14.67 | - | - |
test_cbs2 | 79.17 | 60.29 | 39.06 | 20.81 | 85.02 | 26.54 | 53.39 | 12.74 | - | - |
GIT | 88.1 | 74.81 | 57.68 | 37.35 | 123.39 | 32.5 | 63.12 | 15.94 | GIT: A Generative Image-to-text Transformer for Vision and Language | |
UpDown + ELMo + CBS | 76.59 | 56.74 | 35.39 | 18.41 | 73.09 | 24.42 | 51.82 | 11.2 | - | - |
GIT2 | 88.43 | 75.02 | 57.87 | 37.65 | 124.77 | 32.56 | 63.19 | 16.06 | GIT: A Generative Image-to-text Transformer for Vision and Language | |
UpDown | 74.0 | 55.11 | 35.23 | 19.16 | 54.25 | 22.96 | 50.92 | 10.14 | - | - |
Neural Baby Talk + CBS | 73.42 | 52.12 | 29.35 | 12.88 | 61.48 | 22.06 | 48.74 | 9.69 | - | - |
icp2ssi1_coco_si_0.02_5_test | 78.77 | 61.54 | 41.85 | 23.77 | 85.3 | 25.96 | 54.59 | 11.84 | - | - |
Microsoft Cognitive Services team | 82.27 | 66.04 | 47.48 | 28.95 | 100.12 | 29.47 | 58.26 | 14.04 | VIVO: Visual Vocabulary Pre-Training for Novel Object Captioning | - |
VLAF2 | 83.69 | 67.96 | 49.38 | 29.69 | 102.39 | 29.68 | 58.99 | 14.71 | - | - |
0 of 12 row(s) selected.