HyperAI

Visual Question Answering Vqa On Core Mm

Metrics

Abductive
Analogical
Deductive
Overall score
Params

Results

Performance results of various models on this benchmark

Comparison Table
Model NameAbductiveAnalogicalDeductiveOverall scoreParams
minigpt-4-enhancing-vision-language13.285.6911.0210.438B
blip-2-bootstrapping-language-image-pre18.967.52.7619.313B
gpt-4-technical-report-177.8869.8674.8674.44-
sphinx-the-joint-mixing-of-weights-tasks-and49.8520.6942.1739.4816B
instructblip-towards-general-purpose-vision37.7620.5627.5628.028B
generative-pretraining-in-multimodality36.5718.1928.928.2414B
otter-a-multi-modal-model-with-in-context33.6413.3322.4922.697B
cogvlm-visual-expert-for-pretrained-language47.8828.7536.7537.1617B
mplug-owl2-revolutionizing-multi-modal-large20.67.6423.4320.057B
openflamingo-an-open-source-framework-for5.31.118.886.829B
improved-baselines-with-visual-instruction47.9124.3130.9432.6213B
qwen-vl-a-frontier-large-vision-language44.3930.4237.5537.3916B
llama-adapter-v2-parameter-efficient-visual46.1222.0828.730.467B
internlm-xcomposer-a-vision-language-large35.9718.6126.7726.849B