HyperAI초신경
홈
뉴스
최신 연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
한국어
HyperAI초신경
Toggle sidebar
전체 사이트 검색...
⌘
K
홈
SOTA
Image Captioning
Image Captioning On Nocaps Val Out Domain
Image Captioning On Nocaps Val Out Domain
평가 지표
CIDEr
SPICE
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
CIDEr
SPICE
Paper Title
Repository
Enc-Dec
94.5
11.9
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
BLIP_CapFilt-L
111.5
14.2
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
BLIP_ViT-L
115.3
14.4
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
BLIP-2 ViT-G FlanT5 XL (zero-shot)
124.8
15.1
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
OmniVL
106.3
14.2
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
-
BLIP-2 ViT-G OPT 6.7B (zero-shot)
124.4
14.8
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
LEMON_large
111.3
14.0
Scaling Up Vision-Language Pre-training for Image Captioning
-
BLIP-2 ViT-G OPT 2.7B (zero-shot)
123.4
15.1
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
SimVLM
115.2
-
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
VinVL
88.3
12.1
VinVL: Revisiting Visual Representations in Vision-Language Models
0 of 10 row(s) selected.
Previous
Next