HyperAI

Image Captioning On Nocaps In Domain

Metriken

B1
B2
B3
B4
CIDEr
METEOR
ROUGE-L
SPICE

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Modellname
B1
B2
B3
B4
CIDEr
METEOR
ROUGE-L
SPICE
Paper TitleRepository
GIT2, Single Model88.8675.8659.9441.1124.1833.8363.8216.36GIT: A Generative Image-to-text Transformer for Vision and Language
MD84.0369.1251.1633.15100.0330.0659.6714.08--
CoCa - Google Brain87.2774.2958.0139.24117.933.0163.1215.49--
7_10-7_40000_predict_test.json75.3156.7937.8521.9173.7326.0252.4412.04--
IEDA-LAB84.469.851.8932.86102.6430.4360.0714.47--
PaLI----149.1---PaLI: A Jointly-Scaled Multilingual Language-Image Model
Xinyi81.6163.7443.2224.8284.7927.2755.0312.3--
evertyhing79.5863.0943.9226.0787.8627.9755.8812.6--
CS395T72.2451.8829.5714.5458.9322.0449.058.91--
cxy_nocaps_training81.6463.7943.4325.1585.8127.2555.0612.35--
FudanWYZ82.9168.0250.7533.59104.2531.3359.6714.85--
GIT, Single Model88.5576.160.5341.65122.433.4164.0216.18GIT: A Generative Image-to-text Transformer for Vision and Language
YX76.4858.7639.2821.9669.5925.0853.2210.94--
UpDown77.6860.3441.524.5774.2726.0454.4211.47--
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS82.968.0949.7331.2496.6329.3758.6213.61--
Single Model84.6470.052.9634.66108.9831.9761.0114.6SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
coco_all_1972.7653.5234.1319.4564.3723.4750.5310.11--
MQ-UpDown-C78.7361.6342.3525.9480.1927.2555.2512.38--
Human76.8957.337.7821.4980.6128.5353.4714.99--
GRIT (zero-shot, no VL pretraining, no CBS)----105.9--13.6GRIT: Faster and Better Image captioning Transformer Using Dual Visual Features
0 of 41 row(s) selected.