HyperAI

Visual Question Answering Vqa On Core Mm

Métriques

Abductive
Analogical
Deductive
Overall score
Params

Résultats

Résultats de performance de divers modèles sur ce benchmark

Tableau comparatif
Nom du modèleAbductiveAnalogicalDeductiveOverall scoreParams
minigpt-4-enhancing-vision-language13.285.6911.0210.438B
blip-2-bootstrapping-language-image-pre18.967.52.7619.313B
gpt-4-technical-report-177.8869.8674.8674.44-
sphinx-the-joint-mixing-of-weights-tasks-and49.8520.6942.1739.4816B
instructblip-towards-general-purpose-vision37.7620.5627.5628.028B
generative-pretraining-in-multimodality36.5718.1928.928.2414B
otter-a-multi-modal-model-with-in-context33.6413.3322.4922.697B
cogvlm-visual-expert-for-pretrained-language47.8828.7536.7537.1617B
mplug-owl2-revolutionizing-multi-modal-large20.67.6423.4320.057B
openflamingo-an-open-source-framework-for5.31.118.886.829B
improved-baselines-with-visual-instruction47.9124.3130.9432.6213B
qwen-vl-a-frontier-large-vision-language44.3930.4237.5537.3916B
llama-adapter-v2-parameter-efficient-visual46.1222.0828.730.467B
internlm-xcomposer-a-vision-language-large35.9718.6126.7726.849B