Video Question Answering On Tvbench
Metrics
Average Accuracy
Results
Performance results of various models on this benchmark
Comparison Table
Model Name | Average Accuracy |
---|---|
aria-an-open-multimodal-native-mixture-of | 51.0 |
pllava-parameter-free-llava-extension-from-1 | 42.3 |
mplug-owl3-towards-long-image-sequence | 42.2 |
tarsier-recipes-for-training-and-evaluating-1 | 46.9 |
video-instruction-tuning-with-synthetic-data | 45.6 |
pllava-parameter-free-llava-extension-from-1 | 34.9 |
internlm-xcomposer-2-5-a-versatile-large | 51.6 |
qwen2-vl-enhancing-vision-language-model-s | 52.7 |
qwen2-vl-enhancing-vision-language-model-s | 43.8 |
video-instruction-tuning-with-synthetic-data | 50.0 |
st-llm-large-language-models-are-effective-1 | 35.7 |
gpt-4o-system-card | 39.9 |
videollama-2-advancing-spatial-temporal | 48.4 |
pllava-parameter-free-llava-extension-from-1 | 36.4 |
videogpt-integrating-image-and-video-encoders | 41.7 |
videollama-2-advancing-spatial-temporal | 42.9 |
videollama-2-advancing-spatial-temporal | 42.1 |
tarsier2-advancing-large-vision-language | 54.7 |
mvbench-a-comprehensive-multi-modal-video | 35.0 |
gemini-1-5-unlocking-multimodal-understanding | 47.6 |
tarsier-recipes-for-training-and-evaluating-1 | 55.5 |