HyperAI
Startseite
Neuigkeiten
Neueste Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Deutsch
HyperAI
Toggle sidebar
Seite durchsuchen…
⌘
K
Startseite
SOTA
Image Captioning
Image Captioning On Nocaps Val Out Domain
Image Captioning On Nocaps Val Out Domain
Metriken
CIDEr
SPICE
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
CIDEr
SPICE
Paper Title
Repository
Enc-Dec
94.5
11.9
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
BLIP_CapFilt-L
111.5
14.2
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
BLIP_ViT-L
115.3
14.4
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
BLIP-2 ViT-G FlanT5 XL (zero-shot)
124.8
15.1
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
OmniVL
106.3
14.2
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
-
BLIP-2 ViT-G OPT 6.7B (zero-shot)
124.4
14.8
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
LEMON_large
111.3
14.0
Scaling Up Vision-Language Pre-training for Image Captioning
-
BLIP-2 ViT-G OPT 2.7B (zero-shot)
123.4
15.1
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
SimVLM
115.2
-
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
VinVL
88.3
12.1
VinVL: Revisiting Visual Representations in Vision-Language Models
0 of 10 row(s) selected.
Previous
Next