Visual Instruction Following On Llava Bench
Métriques
avg score
Résultats
Résultats de performance de divers modèles sur ce benchmark
Nom du modèle | avg score | Paper Title | Repository |
---|---|---|---|
ShareGPT4V-13B | 79.9 | ShareGPT4V: Improving Large Multi-Modal Models with Better Captions | |
CuMo-7B | 85.7 | CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts | |
LLaVA-v1.5-13B | 70.7 | Improved Baselines with Visual Instruction Tuning | |
InstructBLIP-13B | 58.2 | InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning | |
BLIP-2 | 38.1 | BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models | |
ShareGPT4V-7B | 72.6 | ShareGPT4V: Improving Large Multi-Modal Models with Better Captions | |
InstructBLIP-7B | 60.9 | InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning | |
LLaVA-v1.5-7B | 63.4 | Improved Baselines with Visual Instruction Tuning |
0 of 8 row(s) selected.