HyperAI

Visual Question Answering On Mm Vet V2

المقاييس

GPT-4 score

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

اسم النموذج
GPT-4 score
Paper TitleRepository
InternVL-Chat-V1-245.5±0.1--
InternVL2-Llama3-76B68.4±0.3--
Claude 3.5 Sonnet (claude-3-5-sonnet-20240620)71.8±0.2Claude 3.5 Sonnet Model Card Addendum-
CogVLM-Chat45.1±0.2CogVLM: Visual Expert for Pretrained Language Models
LLaVA-NeXT-34B50.9±0.1--
Qwen-VL-Max55.8±0.2Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Emu2-Chat38.0±0.1Generative Multimodal Models are In-Context Learners
Otter-9B23.2±0.1MIMIC-IT: Multi-Modal In-Context Instruction Tuning
gemini-2.0-flash-exp77.1±0.1--
InternVL2-40B63.8±0.2--
Gemini Pro Vision57.2±0.2Gemini: A Family of Highly Capable Multimodal Models
OpenFlamingo-9B17.6±0.2OpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
GPT-4o (gpt-4o-2024-11-20)72.1±0.2GPT-4 Technical Report
LLaVA-v1.5-13B33.2±0.1Improved Baselines with Visual Instruction Tuning
Claude 3 Opus (claude-3-opus-20240229)55.8±0.2--
InternVL-Chat-V1-551.5±0.2How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites-
IXC2-VL-7B42.5±0.3InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
GPT-4o (gpt-4o-2024-05-13)71.0±0.2GPT-4 Technical Report
LLaVA-v1.5-7B28.3±0.2Improved Baselines with Visual Instruction Tuning
Qwen2-VL-72B (qwen-vl-max-0809)66.9±0.3Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
0 of 24 row(s) selected.