Video Question Answering On Mvbench
Métriques
Avg.
Résultats
Résultats de performance de divers modèles sur ce benchmark
Tableau comparatif
Nom du modèle | Avg. |
---|---|
st-llm-large-language-models-are-effective-1 | 54.9 |
tarsier-recipes-for-training-and-evaluating-1 | 67.6 |
ppllava-varied-video-sequence-understanding | 59.2 |
minigpt-4-enhancing-vision-language | 18.8 |
videochat-chat-centric-video-understanding | 35.5 |
oryx-mllm-on-demand-spatial-temporal | 64.7 |
instructblip-towards-general-purpose-vision | 32.5 |
longvu-spatiotemporal-adaptive-compression | 66.9 |
video-llama-an-instruction-tuned-audio-visual | 34.1 |
linvt-empower-your-image-level-large-language | 69.3 |
hawkeye-training-video-text-llms-for | 47.55 |
video-chatgpt-towards-detailed-video | 32.7 |
sphinx-x-scaling-data-and-parameters-for-a | 39.7 |
videollama-2-advancing-spatial-temporal | 62.0 |
mvbench-a-comprehensive-multi-modal-video | 51.9 |
mplug-owl3-towards-long-image-sequence | 59.5 |
pllava-parameter-free-llava-extension-from-1 | 58.1 |
internvideo2-scaling-video-foundation-models | 67.2 |
videogpt-integrating-image-and-video-encoders | 58.7 |
visual-instruction-tuning-1 | 36.0 |
timechat-a-time-sensitive-multimodal-large | 38.5 |