Home News Papers Tutorials Datasets Wiki SOTA LLM Models GPU Leaderboard Events

English

Visual Instruction Following On Llava Bench

Metrics

avg score

Results

Performance results of various models on this benchmark

Model Name	avg score	Paper Title	Repository
ShareGPT4V-13B	79.9	ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
CuMo-7B	85.7	CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
LLaVA-v1.5-13B	70.7	Improved Baselines with Visual Instruction Tuning
InstructBLIP-13B	58.2	InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
BLIP-2	38.1	BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
ShareGPT4V-7B	72.6	ShareGPT4V: Improving Large Multi-Modal Models with Better Captions
InstructBLIP-7B	60.9	InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
LLaVA-v1.5-7B	63.4	Improved Baselines with Visual Instruction Tuning

0 of 8 row(s) selected.