HyperAI
HyperAI
Startseite
Plattform
Dokumentation
Neuigkeiten
Forschungsarbeiten
Tutorials
Datensätze
Wiki
SOTA
LLM-Modelle
GPU-Rangliste
Veranstaltungen
Suche
Über
Nutzungsbedingungen
Datenschutzrichtlinie
Deutsch
HyperAI
HyperAI
Toggle Sidebar
Seite durchsuchen…
⌘
K
Command Palette
Search for a command to run...
Plattform
Startseite
SOTA
Visuelles Fragebeantworten
Visual Question Answering On Mm Vet
Visual Question Answering On Mm Vet
Metriken
GPT-4 score
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Columns
Modellname
GPT-4 score
Paper Title
gemini-2.0-flash-exp
81.2±0.4
-
gemini-exp-1206
78.1±0.2
-
Gemini 1.5 Pro (gemini-1.5-pro-002)
76.9±0.1
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
MMCTAgent (GPT-4 + GPT-4V)
74.24
MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning
Claude 3.5 Sonnet (claude-3-5-sonnet-20240620)
74.2±0.2
Claude 3.5 Sonnet Model Card Addendum
Qwen2-VL-72B
74.0
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
InternVL2.5-78B
72.3
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
GPT-4o +text rationale +IoT
72.2
Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models
Lyra-Pro
71.4
Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
GLM-4V-Plus
71.1
CogVLM2: Visual Language Models for Image and Video Understanding
Phantom-7B
70.8
Phantom of Latent for Large Language and Vision Models
GPT-4o (gpt-4o-2024-05-13)
69.3±0.1
GPT-4 Technical Report
InternVL2.5-38B
68.8
Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
gpt-4o-mini-2024-07-18
68.6±0.1
GPT-4 Technical Report
GPT-4V
67.7±0.3
GPT-4 Technical Report
GPT-4V-Turbo-detail:high
67.6±0.1
GPT-4 Technical Report
Qwen-VL-Max
66.6±0.5
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Gemini 1.5 Pro (gemini-1.5-pro)
65.8±0.1
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
InternVL2-26B (SGP, token ratio 64%)
65.60
A Stitch in Time Saves Nine: Small VLM is a Precise Guidance for Accelerating Large VLMs
Baichuan-Omni (7B)
65.4
Baichuan-Omni Technical Report
0 of 229 row(s) selected.
Previous
Next