HyperAI

Image Captioning On Nocaps Entire

Metriken

B1
B2
B3
B4
CIDEr
METEOR
ROUGE-L
SPICE

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Modellname
B1
B2
B3
B4
CIDEr
METEOR
ROUGE-L
SPICE
Paper TitleRepository
firethehole80.7765.5548.1430.297.6130.0758.2514.74--
cxy_nocaps_training78.7559.3637.5619.7278.4825.1352.5411.57--
GIT, Single Model88.174.8157.6837.35123.3932.563.1215.94GIT: A Generative Image-to-text Transformer for Vision and Language
vinvl_yuan_cbs79.3260.9539.520.379.0425.4453.811.9--
Lyrics----126.8---Lyrics: Boosting Fine-grained Language-Vision Alignment and Comprehension via Semantic-aware Visual Objects-
Neural Baby Talk72.3352.4230.8314.7353.3621.5248.879.15--
ClipCap (Transformer)----65.83--10.86ClipCap: CLIP Prefix for Image Captioning
Human76.6456.4636.3719.4885.3428.1552.8314.67--
ViTCAP-CIDEr-136.7-ENC-DEC-ViTbfocal10-test-CBS81.0364.6245.2626.5287.5627.3656.712.81--
evertyhing78.9261.641.5223.5286.026.3154.7512.1--
FudanWYZ82.9567.4549.5831.38106.8130.3259.1814.56--
IEDA-LAB83.2567.348.4129.2798.0828.9258.5613.9--
CoCa - Google Brain87.0173.7156.8837.71120.5532.2962.5215.47--
Neural Baby Talk + CBS73.4252.1229.3512.8861.4822.0648.749.69--
FudanFVL83.968.7750.8432.17108.2930.6459.8214.72--
Yu-Wu67.8547.3725.7611.9646.1819.8446.618.35--
VinVL (Microsoft Cognitive Services + MSR)81.5965.1545.0426.1592.4627.5756.9613.07VinVL: Revisiting Visual Representations in Vision-Language Models
camel XE77.9760.2740.6823.4875.8826.1554.311.89--
B273.0454.0833.8817.6947.6921.8549.979.42--
MD82.4366.2547.1828.293.028.0957.5713.35--
0 of 39 row(s) selected.