Visual Question Answering On Mmbench

GPT-3.5 score

Résultats

Résultats de performance de divers modèles sur ce benchmark

Nom du modèle	GPT-3.5 score	Paper Title
Video-LaVIT	67.3	Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
DreamLLM-7B	49.9	DreamLLM: Synergistic Multimodal Comprehension and Creation
CuMo-7B	73.0	CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts
LLaVA-InternLM2-ViT + MoSLoRA	73.8	Mixture-of-Subspaces in Low-Rank Adaptation
LLaVA-LLaMA3-8B-ViT + MoSLoRA	73.0	Mixture-of-Subspaces in Low-Rank Adaptation

0 of 5 row(s) selected.