HyperAI

Image Captioning On Nocaps Near Domain

Metriken

B1
B2
B3
B4
CIDEr
METEOR
ROUGE-L
SPICE

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Modellname
B1
B2
B3
B4
CIDEr
METEOR
ROUGE-L
SPICE
Paper TitleRepository
Neural Baby Talk + CBS74.7753.6730.6613.8561.9822.5549.459.83--
ClipCap (Transformer)----66.82--10.92ClipCap: CLIP Prefix for Image Captioning
Xinyi79.5960.5238.9520.7279.4425.6453.1811.88--
GIT2, Single Model88.975.8658.938.95125.5132.9563.6616.11GIT: A Generative Image-to-text Transformer for Vision and Language
FudanFVL84.4769.6651.9533.46109.3331.0860.3414.79--
7_10-7_40000_predict_test.json73.654.2634.5918.9563.9624.5251.2311.14--
area_attention73.1953.5632.9417.4950.3422.4349.799.7--
Oscar80.5462.3240.6522.3782.0725.9154.7811.53--
Neural Baby Talk73.6954.132.3715.9953.2121.9349.639.26--
vinvl_yuan_cbs80.2462.3141.0721.5380.2125.9854.5212.12--
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS81.9365.8846.7227.9489.8727.8957.3412.98--
None72.9153.7433.4918.0458.523.1250.5310.28--
PaLI-------15.75PaLI: A Jointly-Scaled Multilingual Language-Image Model
CoCa - Google Brain87.5374.4957.8938.92120.7332.7162.9115.54--
MQ-UpDown-C77.7659.038.2921.076.3425.5953.1511.87--
RCAL79.2162.2640.7722.5684.026.354.6212.47--
camel XE79.2162.0642.5125.0679.1426.8755.2412.14--
nocaps_training75.2556.9336.9120.4956.8523.651.8410.33--
UpDown75.2556.9336.9120.4956.8523.651.8410.33--
ClipCap (MLP + GPT2 tuning)----67.69--11.26ClipCap: CLIP Prefix for Image Captioning
0 of 40 row(s) selected.