HyperAI
HyperAI
الرئيسية
المنصة
الوثائق
الأخبار
الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
شروط الخدمة
سياسة الخصوصية
العربية
HyperAI
HyperAI
Toggle Sidebar
البحث في الموقع...
⌘
K
Command Palette
Search for a command to run...
المنصة
الرئيسية
SOTA
التعليق على الصور
Image Captioning On Nocaps Val Near Domain
Image Captioning On Nocaps Val Near Domain
المقاييس
CIDEr
Pre-train (#images)
SPICE
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
CIDEr
Pre-train (#images)
SPICE
Paper Title
BLIP-2 ViT-G FlanT5 XL (zero-shot)
120.2
1.1B
15.9
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
BLIP-2 ViT-G OPT 6.7B (zero-shot)
119.2
1.1B
15.3
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
BLIP-2 ViT-G OPT 2.7B (zero-shot)
117.8
1.1B
15.4
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
LEMON_large
113.3
200M
15.1
Scaling Up Vision-Language Pre-training for Image Captioning
BLIP_ViT-L
112.1
129M
14.9
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
SimVLM
110.9
1.8B
-
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
BLIP_CapFilt-L
108.6
129M
14.8
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
OmniVL
108.3
14M
14.9
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
VinVL
96.1
5.7M
13.8
VinVL: Revisiting Visual Representations in Vision-Language Models
Enc-Dec
88.3
-
12.1
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
0 of 10 row(s) selected.
Previous
Next
Image Captioning On Nocaps Val Near Domain | SOTA | HyperAI