Home News Latest Papers Tutorials Datasets Wiki SOTA LLM Models GPU Leaderboard Events

English

Visual Question Answering Vqa On Vlm2 Bench

Metrics

Average Score on VLM2-bench (9 subtasks)

GC-mat

GC-trk

OC-cnt

OC-cpr

OC-grp

PC-VID

PC-cnt

PC-cpr

PC-grp

Results

Performance results of various models on this benchmark

Model Name	Average Score on VLM2-bench (9 subtasks)	GC-mat	GC-trk	OC-cnt	OC-cpr	OC-grp	PC-VID	PC-cnt	PC-cpr	PC-grp	Paper Title	Repository
mPLUG-Owl3-7B	37.85	17.37	18.26	62.97	49.17	31.00	13.50	58.86	63.50	26.00	mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
LLaVA-OneVision-7B	39.35	16.60	13.70	56.17	47.22	27.50	47.25	46.67	62.00	37.00	LLaVA-OneVision: Easy Visual Task Transfer
InternVL2.5-26B	45.59	30.50	30.59	51.48	43.33	52.50	21.75	59.70	59.50	61.00	Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
LLaVA-Video-7B	43.32	18.53	12.79	62.47	54.72	28.50	59.00	66.91	62.00	25.00	Video Instruction Tuning With Synthetic Data	-
LongVA-7B	22.59	14.29	19.18	42.53	26.67	18.50	3.75	38.90	21.50	18.00	Long Context Transfer from Language to Vision
InternVL2.5-8B	41.23	21.24	26.03	55.23	53.33	46.50	5.25	60.00	51.50	52.00	Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Qwen2-VL-7B	42.37	27.80	19.18	45.99	68.06	35.00	16.25	58.59	61.50	49.00	Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
Qwen2.5-VL-7B	54.82	35.91	43.38	41.72	71.39	47.50	46.50	57.98	80.00	69.00	Qwen2.5-VL Technical Report
GPT-4o	60.36	37.45	39.27	80.62	74.17	57.50	66.75	90.50	50.00	47.00	GPT-4o System Card	-

0 of 9 row(s) selected.

Visual Question Answering Vqa On Vlm2 Bench | SOTA | HyperAI