HyperAI

Visual Question Answering Vqa On Core Mm

Métriques

Abductive
Analogical
Deductive
Overall score
Params

Résultats

Résultats de performance de divers modèles sur ce benchmark

Nom du modèle
Abductive
Analogical
Deductive
Overall score
Params
Paper TitleRepository
MiniGPT-v213.285.6911.0210.438BMiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
BLIP-2-OPT2.7B18.967.52.7619.313BBLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
GPT-4V77.8869.8674.8674.44-GPT-4 Technical Report
SPHINX v249.8520.6942.1739.4816BSPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
InstructBLIP37.7620.5627.5628.028BInstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Emu36.5718.1928.928.2414BEmu: Generative Pretraining in Multimodality
Otter33.6413.3322.4922.697BOtter: A Multi-Modal Model with In-Context Instruction Tuning
CogVLM-Chat47.8828.7536.7537.1617BCogVLM: Visual Expert for Pretrained Language Models
mPLUG-Owl220.67.6423.4320.057BmPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
OpenFlamingo-v25.31.118.886.829BOpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
LLaVA-1.547.9124.3130.9432.6213BImproved Baselines with Visual Instruction Tuning
Qwen-VL-Chat44.3930.4237.5537.3916BQwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
LLaMA-Adapter V2 46.1222.0828.730.467BLLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
InternLM-XComposer-VL35.9718.6126.7726.849BInternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
0 of 14 row(s) selected.