HyperAI

Video Question Answering On Mvbench

Metrics

Avg.

Results

Performance results of various models on this benchmark

Comparison Table
Model NameAvg.
st-llm-large-language-models-are-effective-154.9
tarsier-recipes-for-training-and-evaluating-167.6
ppllava-varied-video-sequence-understanding59.2
minigpt-4-enhancing-vision-language18.8
videochat-chat-centric-video-understanding35.5
oryx-mllm-on-demand-spatial-temporal64.7
instructblip-towards-general-purpose-vision32.5
longvu-spatiotemporal-adaptive-compression66.9
video-llama-an-instruction-tuned-audio-visual34.1
linvt-empower-your-image-level-large-language69.3
hawkeye-training-video-text-llms-for47.55
video-chatgpt-towards-detailed-video32.7
sphinx-x-scaling-data-and-parameters-for-a39.7
videollama-2-advancing-spatial-temporal62.0
mvbench-a-comprehensive-multi-modal-video51.9
mplug-owl3-towards-long-image-sequence59.5
pllava-parameter-free-llava-extension-from-158.1
internvideo2-scaling-video-foundation-models67.2
videogpt-integrating-image-and-video-encoders58.7
visual-instruction-tuning-136.0
timechat-a-time-sensitive-multimodal-large38.5