Image Captioning On Flickr30K Captions Test
المقاييس
CIDEr
SPICE
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
اسم النموذج | CIDEr | SPICE | Paper Title | Repository |
---|---|---|---|---|
MetaLM | 43.3 | 11.7 | Language Models are General-Purpose Interfaces | |
Unified VLP | 67.4 | 17 | Unified Vision-Language Pre-Training for Image Captioning and VQA | |
FewVLM | 31.0 | 10.0 | A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models | |
KOSMOS-1 1.6B (zero-shot) | 67.1 | 14.5 | - | - |
VL-T5 | 2.6 | 2.0 | Unifying Vision-and-Language Tasks via Text Generation | |
Cornia et al | 46.4 | - | Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention | - |
BRNN | 24.7 | - | Deep Visual-Semantic Alignments for Generating Image Descriptions |
0 of 7 row(s) selected.