HyperAI
Home
News
Latest Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
English
HyperAI
Toggle sidebar
Search the site…
⌘
K
Home
SOTA
Mmr Total
Mmr Total On Mrr Benchmark
Mmr Total On Mrr Benchmark
Metrics
Total Column Score
Results
Performance results of various models on this benchmark
Columns
Model Name
Total Column Score
Paper Title
Repository
InternVL2-8B
368
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Idefics-80B
139
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Idefics-2-8B
256
What matters when building vision-language models?
-
InternVL2-1B
237
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
GPT-4o
457
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding
-
Phi-3-Vision
397
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
-
Qwen-vl-max
366
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
LLaVA-NEXT-13B
335
Visual Instruction Tuning
LLaVA-NEXT-34B
412
Visual Instruction Tuning
GPT-4V
415
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
Claude 3.5 Sonnet
463
Claude 3.5 Sonnet Model Card Addendum
-
Monkey-Chat-7B
214
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Qwen-vl-plus
310
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
LLaVA-1.5-13B
243
Visual Instruction Tuning
0 of 14 row(s) selected.
Previous
Next