HyperAI超神経

Visual Question Answering Vqa On Core Mm

評価指標

Abductive
Analogical
Deductive
Overall score
Params

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

モデル名
Abductive
Analogical
Deductive
Overall score
Params
Paper TitleRepository
MiniGPT-v213.285.6911.0210.438BMiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
BLIP-2-OPT2.7B18.967.52.7619.313BBLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
GPT-4V77.8869.8674.8674.44-GPT-4 Technical Report
SPHINX v249.8520.6942.1739.4816BSPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models
InstructBLIP37.7620.5627.5628.028BInstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Emu36.5718.1928.928.2414BEmu: Generative Pretraining in Multimodality
Otter33.6413.3322.4922.697BOtter: A Multi-Modal Model with In-Context Instruction Tuning
CogVLM-Chat47.8828.7536.7537.1617BCogVLM: Visual Expert for Pretrained Language Models
mPLUG-Owl220.67.6423.4320.057BmPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
OpenFlamingo-v25.31.118.886.829BOpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models
LLaVA-1.547.9124.3130.9432.6213BImproved Baselines with Visual Instruction Tuning
Qwen-VL-Chat44.3930.4237.5537.3916BQwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
LLaMA-Adapter V2 46.1222.0828.730.467BLLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
InternLM-XComposer-VL35.9718.6126.7726.849BInternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
0 of 14 row(s) selected.