MGM-2B (w/o LoRA, w/ extra data) | 93.24 | Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models | |
LLaVA-Med-v1.0 (w/o LoRA, w/ extra data) | 93.84 | LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day | |
LLaVA-Med-v1.5 (w/ LoRA, w/ extra data) | 87.22 | LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day | |
Bunny-v1.0-3B (w/ LoRA, w/o extra data) | 91.16 | Efficient Multimodal Learning from Data-centric Perspective | |
Bunny-v1.0-3B (w/ LoRA, w/ extra data) | 92.47 | Efficient Multimodal Learning from Data-centric Perspective | |
MiniGPT-v2 (w/ LoRA, w/ extra data) | 90.00 | MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning | |
LLaVA-v1.5 (w/ LoRA, w/o extra data) | 92.97 | Improved Baselines with Visual Instruction Tuning | |
LLaVA-v1.5 (w/ LoRA, w/ extra data) | 93.33 | Improved Baselines with Visual Instruction Tuning | |
MGM-2B (w/o LoRA, w/o extra data) | 92.97 | Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models | |
ColonGPT (w/ LoRA, w/o extra data) | 94.06 | Frontiers in Intelligent Colonoscopy | |
LLaVA-Med-v1.5 (w/ LoRA, w/o extra data) | 93.62 | LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day | |
LLaVA-Med-v1.0 (w/o LoRA, w/o extra data) | 93.52 | LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day | |
MobileVLM-1.7B (w/o LoRA, w/ extra data) | 93.02 | MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices | |
MobileVLM-1.7B (w/ LoRA, w/ extra data) | 93.64 | MobileVLM : A Fast, Strong and Open Vision Language Assistant for Mobile Devices | |
LLaVA-v1 (w/ LoRA, w/ extra data) | 89.61 | Visual Instruction Tuning | |
MiniGPT-v2 (w/ LoRA, w/o extra data) | 91.49 | MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning | |
LLaVA-v1 (w/ LoRA, w/o extra data) | 87.86 | Visual Instruction Tuning | |