Video Question Answering On Tvbench

평가 지표

Average Accuracy

평가 결과

이 벤치마크에서 각 모델의 성능 결과

		Paper Title
Tarsier-34B	55.5	Tarsier: Recipes for Training and Evaluating Large Video Description Models
Tarsier2-7B	54.7	Tarsier2: Advancing Large Vision-Language Models from Detailed Video Description to Comprehensive Video Understanding
Qwen2-VL-72B	52.7	Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
IXC-2.5 7B	51.6	InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Aria	51.0	Aria: An Open Multimodal Native Mixture-of-Experts Model
LLaVA-Video 72B	50.0	Video Instruction Tuning With Synthetic Data
VideoLLaMA2 72B	48.4	VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Gemini 1.5 Pro	47.6	Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Tarsier-7B	46.9	Tarsier: Recipes for Training and Evaluating Large Video Description Models
LLaVA-Video 7B	45.6	Video Instruction Tuning With Synthetic Data
Qwen2-VL-7B	43.8	Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution
VideoLLaMA2 7B	42.9	VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
PLLaVA-34B	42.3	PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
mPLUG-Owl3	42.2	mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models
VideoLLaMA2.1	42.1	VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
VideoGPT+	41.7	VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
GPT4o 8 frames	39.9	GPT-4o System Card
PLLaVA-13B	36.4	PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
ST-LLM	35.7	ST-LLM: Large Language Models Are Effective Temporal Learners
VideoChat2	35.0	MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

0 of 21 row(s) selected.

Command Palette

Video Question Answering On Tvbench

평가 지표

평가 결과