HyperAI超神経
ホーム
ニュース
最新論文
チュートリアル
データセット
百科事典
SOTA
LLMモデル
GPU ランキング
学会
検索
サイトについて
日本語
HyperAI超神経
Toggle sidebar
サイトを検索…
⌘
K
ホーム
SOTA
Mmr Total
Mmr Total On Mrr Benchmark
Mmr Total On Mrr Benchmark
評価指標
Total Column Score
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
Columns
モデル名
Total Column Score
Paper Title
Repository
InternVL2-8B
368
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Idefics-80B
139
OBELICS: An Open Web-Scale Filtered Dataset of Interleaved Image-Text Documents
Idefics-2-8B
256
What matters when building vision-language models?
-
InternVL2-1B
237
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
GPT-4o
457
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding
-
Phi-3-Vision
397
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
-
Qwen-vl-max
366
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
LLaVA-NEXT-13B
335
Visual Instruction Tuning
LLaVA-NEXT-34B
412
Visual Instruction Tuning
GPT-4V
415
The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision)
Claude 3.5 Sonnet
463
Claude 3.5 Sonnet Model Card Addendum
-
Monkey-Chat-7B
214
Monkey: Image Resolution and Text Label Are Important Things for Large Multi-modal Models
Qwen-vl-plus
310
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
LLaVA-1.5-13B
243
Visual Instruction Tuning
0 of 14 row(s) selected.
Previous
Next