Visual Question Answering On Benchlmm
Métriques
GPT-3.5 score
Résultats
Résultats de performance de divers modèles sur ce benchmark
Nom du modèle | GPT-3.5 score | Paper Title | Repository |
---|---|---|---|
MiniGPT4-13B | 34.93 | MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models | |
InstructBLIP-7B | 44.63 | InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning | |
LLaVA-1.5-13B | 55.53 | Improved Baselines with Visual Instruction Tuning | |
Sphinx-V2-1K | 57.43 | SPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models | |
LLaVA-1.5-7B | 46.83 | Visual Instruction Tuning | |
InstructBLIP-13B | 45.03 | InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning | |
MiniGPTv2-7B | 30.1 | MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning | |
GPT-4V | 58.37 | GPT-4 Technical Report | |
LLaVA-1-13B | 43.50 | Visual Instruction Tuning | |
Otter-7B | 39.13 | Otter: A Multi-Modal Model with In-Context Instruction Tuning |
0 of 10 row(s) selected.