Image Captioning
图像描述任务(Image Captioning)旨在通过自然语言生成技术对输入图像的内容进行准确的文字描述。该任务结合了计算机视觉与自然语言处理领域的技术,通常采用编码器-解码器框架,将图像信息转化为中间表示,再解码生成描述性文本。主要评估指标包括BLEU和CIDER,常用数据集有nocaps和COCO。图像描述在辅助视觉障碍者理解图像、自动化内容标注及智能图像搜索等领域具有重要应用价值。
AIC-ICC
BanglaLekhaImageCaptions
CNN + 1D CNN
ChEBI-20
GIT-Mol
MS COCO
ExpansionNet v2
COCO Captions
VAST
COCO Captions test
From Captions to Visual Concepts and Back
Conceptual Captions
ClipCap (MLP + GPT2 tuning)
Flickr30k Captions test
Unified VLP
FlickrStyle10K
CapDec
foundation-multimodal-models/DetailCaps-4870
IU X-Ray
Localized Narratives
MS-COCO
NeuSyRE
MSCOCO
CapDec
nocaps entire
nocaps in-domain
VinVL (Microsoft Cognitive Services + MSR)
nocaps near-domain
GIT2, Single Model
nocaps out-of-domain
PaLI
nocaps val
Prismer
nocaps-val-in-domain
nocaps-val-near-domain
nocaps-val-out-domain
nocaps-val-overall
nocaps-XD entire
GIT2
nocaps-XD in-domain
GIT2
nocaps-XD near-domain
GIT2
nocaps-XD out-of-domain
GIT2
Object HalBench
Peir Gross
BiomedGPT
SCICAP
CNN+LSTM (Vision only, First sentence)
TextCaps 2020
VizWiz 2020 test
VizWiz 2020 test-dev
WHOOPS!