Visual Question Answering On Vip Bench
評価指標
GPT-4 score (bbox)
GPT-4 score (human)
評価結果
このベンチマークにおける各モデルのパフォーマンス結果
比較表
モデル名 | GPT-4 score (bbox) | GPT-4 score (human) |
---|---|---|
inst-it-boosting-multimodal-instance | 50.5 | 49.0 |
instructblip-towards-general-purpose-vision | 35.8 | 35.2 |
gpt-4-technical-report-1 | 60.7 | 59.9 |
gpt4roi-instruction-tuning-large-language | 35.1 | - |
kosmos-2-grounding-multimodal-large-language | 26.9 | - |
qwen-vl-a-frontier-large-vision-language | 45.3 | - |
improved-baselines-with-visual-instruction | 41.8 | 42.9 |
shikra-unleashing-multimodal-llm-s | 33.7 | - |
improved-baselines-with-visual-instruction | 47.1 | - |
gpt-4-technical-report-1 | 52.8 | 51.4 |
qwen-vl-a-frontier-large-vision-language | 39.2 | 41.7 |
inst-it-boosting-multimodal-instance | 45.1 | 48.2 |
making-large-language-models-better-data | 48.3 | 48.2 |