LLaVA-Med-v1.5
(w/ LoRA, w/ extra data) | 70.00 | LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day | |
LLaVA-Med-v1.5
(w/ LoRA, w/o extra data) | 73.05 | LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day | |
LLaVA-v1
(w/ LoRA, w/ extra data) | 46.85 | Visual Instruction Tuning | |
MiniGPT-v2
(w/ LoRA, w/ extra data) | 70.23 | MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning | |
MGM-2B
(w/o LoRA, w/ extra data) | 74.30 | Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models | |
MobileVLM-1.7B
(w/o LoRA, w/ extra data) | 73.14 | MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices | |
MiniGPT-v2
(w/ LoRA, w/o extra data) | 72.05 | MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning | |
MGM-2B
(w/o LoRA, w/o extra data) | 69.81 | Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models | |
LLaVA-Med-v1.0
(w/o LoRA, w/o extra data) | 75.07 | LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day | |
LLaVA-v1
(w/ LoRA, w/o extra data) | 68.11 | Visual Instruction Tuning | |
ColonGPT (w/ LoRA, w/o extra data) | 80.18 | Frontiers in Intelligent Colonoscopy | |
LLaVA-Med-v1.0
(w/o LoRA, w/ extra data) | 75.25 | LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day | |
Bunny-v1.0-3B
(w/ LoRA, w/ extra data) | 75.08 | Efficient Multimodal Learning from Data-centric Perspective | |
LLaVA-v1.5
(w/ LoRA, w/o extra data) | 70.38 | Improved Baselines with Visual Instruction Tuning | |
LLaVA-v1.5
(w/ LoRA, w/ extra data) | 72.88 | Improved Baselines with Visual Instruction Tuning | |
Bunny-v1.0-3B
(w/ LoRA, w/o extra data) | 69.45 | Efficient Multimodal Learning from Data-centric Perspective | |
MobileVLM-1.7B
(w/ LoRA, w/ extra data) | 78.03 | MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices | |