Video Question Answering On Perception Test
المقاييس
Accuracy (Top-1)
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
اسم النموذج | Accuracy (Top-1) | Paper Title | Repository |
---|---|---|---|
TraveLER | 50.2 | TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering | - |
VideoLLaMA2 (72B) | 57.5 | VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs | - |
InternVideo2 (8B) | 63.4 | InternVideo2: Scaling Foundation Models for Multimodal Video Understanding | - |
BIMBA-LLaVA-Qwen2-7B | 68.51 | BIMBA: Selective-Scan Compression for Long-Range Video Question Answering | - |
Oyrx (34B) | 71.4 | Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution | - |
Flamingo | 0.46 | Perception Test: A Diagnostic Benchmark for Multimodal Video Models | - |
0 of 6 row(s) selected.