HyperAI超神経
ホーム
ニュース
最新論文
チュートリアル
データセット
百科事典
SOTA
LLMモデル
GPU ランキング
学会
検索
サイトについて
日本語
HyperAI超神経
Toggle sidebar
サイトを検索…
⌘
K
ホーム
SOTA
Image Captioning
Image Captioning On Nocaps Val In Domain
Image Captioning On Nocaps Val In Domain
評価指標
CIDEr
Pre-train (#images)
SPICE
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
Columns
モデル名
CIDEr
Pre-train (#images)
SPICE
Paper Title
Repository
LEMON_base
107.7
200M
14.7
Scaling Up Vision-Language Pre-training for Image Captioning
-
BLIP-2 ViT-G OPT 6.7B (zero-shot)
123.7
1.1B
15.8
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
VinVL
103.1
5.7M
14.2
VinVL: Revisiting Visual Representations in Vision-Language Models
BLIP-2 ViT-G FlanT5 XL (zero-shot)
123.7
1.1B
16.3
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
BLIP_ViT-L
114.9
129M
15.2
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
SimVLM
113.7
1.8B
-
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
LEMON_large
116.9
200M
15.8
Scaling Up Vision-Language Pre-training for Image Captioning
-
BLIP_CapFilt-L
111.8
129M
14.9
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Enc-Dec
92.6
15M
12.5
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
OmniVL
104.6
14M
15
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
-
BLIP-2 ViT-G OPT 2.7B (zero-shot)
123
1.1B
15.8
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
0 of 11 row(s) selected.
Previous
Next