Visual Question Answering On Mm Vet V2
Metrics
GPT-4 score
Results
Performance results of various models on this benchmark
Comparison Table
Model Name | GPT-4 score |
---|---|
Model 1 | 45.5±0.1 |
Model 2 | 68.4±0.3 |
claude-3-5-sonnet-model-card-addendum | 71.8±0.2 |
cogvlm-visual-expert-for-pretrained-language | 45.1±0.2 |
Model 5 | 50.9±0.1 |
qwen-vl-a-frontier-large-vision-language | 55.8±0.2 |
generative-multimodal-models-are-in-context | 38.0±0.1 |
mimic-it-multi-modal-in-context-instruction | 23.2±0.1 |
Model 9 | 77.1±0.1 |
Model 10 | 63.8±0.2 |
gemini-a-family-of-highly-capable-multimodal-1 | 57.2±0.2 |
openflamingo-an-open-source-framework-for | 17.6±0.2 |
gpt-4-technical-report-1 | 72.1±0.2 |
improved-baselines-with-visual-instruction | 33.2±0.1 |
Model 15 | 55.8±0.2 |
how-far-are-we-to-gpt-4v-closing-the-gap-to | 51.5±0.2 |
internlm-xcomposer2-mastering-free-form-text | 42.5±0.3 |
gpt-4-technical-report-1 | 71.0±0.2 |
improved-baselines-with-visual-instruction | 28.3±0.2 |
qwen2-vl-enhancing-vision-language-model-s | 66.9±0.3 |
gpt-4-technical-report-1 | 66.8±0.3 |
cogagent-a-visual-language-model-for-gui | 34.7±0.2 |
gpt-4-technical-report-1 | 66.3±0.2 |
gemini-1-5-unlocking-multimodal-understanding | 66.9±0.2 |