HyperAI

Zero Shot Video Question Answer On Next Qa

Metrics

Accuracy

Results

Performance results of various models on this benchmark

Comparison Table
Model NameAccuracy
mvbench-a-comprehensive-multi-modal-video61.7
vidctx-context-aware-video-question-answering70.7
long-context-transfer-from-language-to-vision67.1
an-image-grid-can-be-worth-a-video-zero-shot70.9
understanding-long-videos-in-one-multimodal55.2
question-instructed-visual-descriptions-for66.3
videotree-adaptive-tree-based-video73.5
self-chained-image-language-model-for-video-163.6
traveler-a-multi-lmm-agent-framework-for68.2
zero-shot-video-question-answering-with64.6
tarsier-recipes-for-training-and-evaluating-179.2
vipergpt-visual-inference-via-python60.0
a-simple-llm-framework-for-long-range-video67.7
morevqa-exploring-modular-reasoning-models69.2
an-image-grid-can-be-worth-a-video-zero-shot68.6
deepstack-deeply-stacking-visual-tokens-is61.0
videoagent-long-form-video-understanding-with71.3
too-many-frames-not-all-useful-efficient72.9
a-simple-llm-framework-for-long-range-video54.3
ts-llava-constructing-visual-tokens-through73.6
enter-event-based-interpretable-reasoning-for75.1
mistral-7b51.1
slowfast-llava-a-strong-training-free64.2
language-repository-for-long-video60.9
verbs-in-action-improving-verb-understanding51.5