HyperAI
HyperAI
الرئيسية
المنصة
الوثائق
الأخبار
الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
شروط الخدمة
سياسة الخصوصية
العربية
HyperAI
HyperAI
Toggle Sidebar
البحث في الموقع...
⌘
K
Command Palette
Search for a command to run...
المنصة
الرئيسية
SOTA
الأسئلة المرئية والإجابة عليها (VQA)
Visual Question Answering On Vqa V2 Val
Visual Question Answering On Vqa V2 Val
المقاييس
Accuracy
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
Accuracy
Paper Title
BLIP-2 ViT-G FlanT5 XXL (zero-shot)
65.2
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
PNP-VQA
63.3
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training
BLIP-2 ViT-G FlanT5 XL (zero-shot)
63.1
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
BLIP-2 ViT-L FlanT5 XL (zero-shot)
62.6
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
BLIP-2 ViT-G OPT 6.7B (zero-shot)
54.3
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
BLIP-2 ViT-G OPT 2.7B (zero-shot)
53.5
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
BLIP-2 ViT-L OPT 2.7B (zero-shot)
50.1
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Few VLM (zero-shot)
47.7
A Good Prompt Is Worth Millions of Parameters: Low-resource Prompt-based Learning for Vision-Language Models
MetaLM
41.1
Language Models are General-Purpose Interfaces
VLKD(ViT-B/16)
38.6
Enabling Multimodal Generation on CLIP via Vision-Language Knowledge Distillation
Frozen
29.5
Multimodal Few-Shot Learning with Frozen Language Models
0 of 11 row(s) selected.
Previous
Next
Visual Question Answering On Vqa V2 Val | SOTA | HyperAI