HyperAI
HyperAI
Startseite
Plattform
Dokumentation
Neuigkeiten
Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Nutzungsbedingungen
Datenschutzrichtlinie
Deutsch
HyperAI
HyperAI
Toggle Sidebar
Seite durchsuchen…
⌘
K
Command Palette
Search for a command to run...
Plattform
Startseite
SOTA
Visuelles Schließen
Visual Reasoning On Winoground
Visual Reasoning On Winoground
Metriken
Group Score
Image Score
Text Score
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
Group Score
Image Score
Text Score
Paper Title
GPT-4V (CoT, pick b/w two options)
58.75
68.75
75.25
The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task
GPT-4V (pick b/w two options)
39.25
46.25
69.25
The Role of Chain-of-Thought in Complex Vision-Language Reasoning Task
MMICL + CoCoT
50.75
52.5
64.25
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
GPT-4V + CoCoT
44.5
49.5
58.5
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
OpenFlamingo + CoCoT
41.5
55.25
58.25
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
GPT-4V
37.75
42.5
54.5
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
FIBER (EqSim)
27.5
32.00
51.5
Equivariant Similarity for Vision-Language Foundation Models
FIBER (finetuned, Flickr30k)
23.00
26.50
51.25
Equivariant Similarity for Vision-Language Foundation Models
MMICL + CCoT
47.5
48
51
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
OpenFlamingo + DDCoT
39
47.25
47.5
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
VQ2
30.5
42.2
47
What You See is What You Read? Improving Text-Image Alignment Evaluation
MMICL + DDCoT
36.75
45
46.75
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
X-VLM 16M
21.2
24.5
46.7
Measuring Progress in Fine-grained Vision-and-Language Understanding
PaLI (ft SNLI-VE + Synthetic Data)
28.75
38
46.5
What You See is What You Read? Improving Text-Image Alignment Evaluation
FIBER
22.25
25.75
46.25
Equivariant Similarity for Vision-Language Foundation Models
MMICL (FLAN-T5-XXL)
43.00
44.99
45.50
MMICL: Empowering Vision-language Model with Multi-Modal In-Context Learning
METER (EqSim)
18.75
22.75
45.0
Equivariant Similarity for Vision-Language Foundation Models
PaLI (ft SNLI-VE)
28.70
41.50
45.00
What You See is What You Read? Improving Text-Image Alignment Evaluation
Gemini + DDCoT
23.75
25
45
CoCoT: Contrastive Chain-of-Thought Prompting for Large Multimodal Models with Multiple Image Inputs
X-VLM 4M
21.5
26.7
44.0
Measuring Progress in Fine-grained Vision-and-Language Understanding
0 of 113 row(s) selected.
Previous
Next
Visual Reasoning On Winoground | SOTA | HyperAI