HyperAIHyperAI초신경
홈뉴스연구 논문튜토리얼데이터셋백과사전SOTALLM 모델GPU 랭킹컨퍼런스
전체 검색
소개
한국어
HyperAIHyperAI초신경
  1. 홈
  2. SOTA
  3. 이미지 캡셔닝
  4. Image Captioning On Nocaps Val Near Domain

Image Captioning On Nocaps Val Near Domain

평가 지표

CIDEr
Pre-train (#images)
SPICE

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름
CIDEr
Pre-train (#images)
SPICE
Paper TitleRepository
BLIP-2 ViT-G OPT 6.7B (zero-shot)119.21.1B15.3BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
OmniVL108.314M14.9OmniVL:One Foundation Model for Image-Language and Video-Language Tasks-
VinVL96.15.7M13.8VinVL: Revisiting Visual Representations in Vision-Language Models
BLIP-2 ViT-G FlanT5 XL (zero-shot)120.21.1B15.9BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Enc-Dec88.3-12.1Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
BLIP_ViT-L112.1129M 14.9BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
LEMON_large113.3200M 15.1Scaling Up Vision-Language Pre-training for Image Captioning-
SimVLM110.91.8B-SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
BLIP_CapFilt-L108.6129M14.8BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
BLIP-2 ViT-G OPT 2.7B (zero-shot)117.81.1B15.4BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
0 of 10 row(s) selected.
HyperAI

학습, 이해, 실천, 커뮤니티와 함께 인공지능의 미래를 구축하다

한국어

소개

회사 소개데이터셋 도움말

제품

뉴스튜토리얼데이터셋백과사전

링크

TVM 한국어Apache TVMOpenBayes

© HyperAI초신경

TwitterBilibili