HyperAI
HyperAI
الرئيسية
المنصة
الوثائق
الأخبار
الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
شروط الخدمة
سياسة الخصوصية
العربية
HyperAI
HyperAI
Toggle Sidebar
البحث في الموقع...
⌘
K
Command Palette
Search for a command to run...
المنصة
الرئيسية
SOTA
الاستدلال البصري
Visual Reasoning On Winoground
Visual Reasoning On Winoground
المقاييس
Group Score
Image Score
Text Score
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
Group Score
Image Score
Text Score
Paper Title
GPT-4V (CoT, pick b/w two options)
58.75
68.75
75.25
The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task
GPT-4V (pick b/w two options)
39.25
46.25
69.25
The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task
MMICL + CoCoT
50.75
52.5
64.25
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
GPT-4V + CoCoT
44.5
49.5
58.5
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
OpenFlamingo + CoCoT
41.5
55.25
58.25
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
GPT-4V
37.75
42.5
54.5
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
FIBER (EqSim)
27.5
32.00
51.5
Equivariant Similarity for Vision-Language Foundation Models
FIBER (finetuned, Flickr30k)
23.00
26.50
51.25
Equivariant Similarity for Vision-Language Foundation Models
MMICL + CCoT
47.5
48
51
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
OpenFlamingo + DDCoT
39
47.25
47.5
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
VQ2
30.5
42.2
47
What You See is What You Read? Improving Text-Image Alignment Evaluation
MMICL + DDCoT
36.75
45
46.75
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
X-VLM 16M
21.2
24.5
46.7
Measuring Progress in Fine-grained Vision-and-Language Understanding
PaLI (ft SNLI-VE + Synthetic Data)
28.75
38
46.5
What You See is What You Read? Improving Text-Image Alignment Evaluation
FIBER
22.25
25.75
46.25
Equivariant Similarity for Vision-Language Foundation Models
MMICL (FLAN-T5-XXL)
43.00
44.99
45.50
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
METER (EqSim)
18.75
22.75
45.0
Equivariant Similarity for Vision-Language Foundation Models
PaLI (ft SNLI-VE)
28.70
41.50
45.00
What You See is What You Read? Improving Text-Image Alignment Evaluation
Gemini + DDCoT
23.75
25
45
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
X-VLM 4M
21.5
26.7
44.0
Measuring Progress in Fine-grained Vision-and-Language Understanding
0 of 113 row(s) selected.
Previous
Next
Visual Reasoning On Winoground | SOTA | HyperAI