Video Question Answering On Perception Test

Accuracy (Top-1)

평가 결과

이 벤치마크에서 각 모델의 성능 결과

		Paper Title
Oyrx (34B)	71.4	Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
BIMBA-LLaVA-Qwen2-7B	68.51	BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
InternVideo2 (8B)	63.4	InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
VideoLLaMA2 (72B)	57.5	VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
TraveLER	50.2	TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering
Flamingo	0.46	Perception Test: A Diagnostic Benchmark for Multimodal Video Models

0 of 6 row(s) selected.