Image Captioning On Nocaps Near Domain
المقاييس
B1
B2
B3
B4
CIDEr
METEOR
ROUGE-L
SPICE
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
اسم النموذج | B1 | B2 | B3 | B4 | CIDEr | METEOR | ROUGE-L | SPICE | Paper Title | Repository |
---|---|---|---|---|---|---|---|---|---|---|
Neural Baby Talk + CBS | 74.77 | 53.67 | 30.66 | 13.85 | 61.98 | 22.55 | 49.45 | 9.83 | - | - |
ClipCap (Transformer) | - | - | - | - | 66.82 | - | - | 10.92 | ClipCap: CLIP Prefix for Image Captioning | |
Xinyi | 79.59 | 60.52 | 38.95 | 20.72 | 79.44 | 25.64 | 53.18 | 11.88 | - | - |
GIT2, Single Model | 88.9 | 75.86 | 58.9 | 38.95 | 125.51 | 32.95 | 63.66 | 16.11 | GIT: A Generative Image-to-text Transformer for Vision and Language | |
FudanFVL | 84.47 | 69.66 | 51.95 | 33.46 | 109.33 | 31.08 | 60.34 | 14.79 | - | - |
7_10-7_40000_predict_test.json | 73.6 | 54.26 | 34.59 | 18.95 | 63.96 | 24.52 | 51.23 | 11.14 | - | - |
area_attention | 73.19 | 53.56 | 32.94 | 17.49 | 50.34 | 22.43 | 49.79 | 9.7 | - | - |
Oscar | 80.54 | 62.32 | 40.65 | 22.37 | 82.07 | 25.91 | 54.78 | 11.53 | - | - |
Neural Baby Talk | 73.69 | 54.1 | 32.37 | 15.99 | 53.21 | 21.93 | 49.63 | 9.26 | - | - |
vinvl_yuan_cbs | 80.24 | 62.31 | 41.07 | 21.53 | 80.21 | 25.98 | 54.52 | 12.12 | - | - |
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS | 81.93 | 65.88 | 46.72 | 27.94 | 89.87 | 27.89 | 57.34 | 12.98 | - | - |
None | 72.91 | 53.74 | 33.49 | 18.04 | 58.5 | 23.12 | 50.53 | 10.28 | - | - |
PaLI | - | - | - | - | - | - | - | 15.75 | PaLI: A Jointly-Scaled Multilingual Language-Image Model | |
CoCa - Google Brain | 87.53 | 74.49 | 57.89 | 38.92 | 120.73 | 32.71 | 62.91 | 15.54 | - | - |
MQ-UpDown-C | 77.76 | 59.0 | 38.29 | 21.0 | 76.34 | 25.59 | 53.15 | 11.87 | - | - |
RCAL | 79.21 | 62.26 | 40.77 | 22.56 | 84.0 | 26.3 | 54.62 | 12.47 | - | - |
camel XE | 79.21 | 62.06 | 42.51 | 25.06 | 79.14 | 26.87 | 55.24 | 12.14 | - | - |
nocaps_training | 75.25 | 56.93 | 36.91 | 20.49 | 56.85 | 23.6 | 51.84 | 10.33 | - | - |
UpDown | 75.25 | 56.93 | 36.91 | 20.49 | 56.85 | 23.6 | 51.84 | 10.33 | - | - |
ClipCap (MLP + GPT2 tuning) | - | - | - | - | 67.69 | - | - | 11.26 | ClipCap: CLIP Prefix for Image Captioning |
0 of 40 row(s) selected.