HyperAI

Visual Question Answering Vqa On Vlm2 Bench

Metrics

Average Score on VLM2-bench (9 subtasks)
GC-mat
GC-trk
OC-cnt
OC-cpr
OC-grp
PC-VID
PC-cnt
PC-cpr
PC-grp

Results

Performance results of various models on this benchmark

Comparison Table
Model NameAverage Score on VLM2-bench (9 subtasks)GC-matGC-trkOC-cntOC-cprOC-grpPC-VIDPC-cntPC-cprPC-grp
mplug-owl3-towards-long-image-sequence37.8517.3718.2662.9749.1731.0013.5058.8663.5026.00
llava-onevision-easy-visual-task-transfer39.3516.6013.7056.1747.2227.5047.2546.6762.0037.00
expanding-performance-boundaries-of-open45.5930.5030.5951.4843.3352.5021.7559.7059.5061.00
video-instruction-tuning-with-synthetic-data43.3218.5312.7962.4754.7228.5059.0066.9162.0025.00
long-context-transfer-from-language-to-vision22.5914.2919.1842.5326.6718.503.7538.9021.5018.00
expanding-performance-boundaries-of-open41.2321.2426.0355.2353.3346.505.2560.0051.5052.00
qwen2-vl-enhancing-vision-language-model-s42.3727.8019.1845.9968.0635.0016.2558.5961.5049.00
qwen2-5-vl-technical-report54.8235.9143.3841.7271.3947.5046.5057.9880.0069.00
gpt-4o-system-card60.3637.4539.2780.6274.1757.5066.7590.5050.0047.00