Video Question Answering On Mvbench

Métriques

Avg.

Résultats

Résultats de performance de divers modèles sur ce benchmark

Nom du modèle	Avg.	Paper Title	Repository
ST-LLM	54.9	ST-LLM: Large Language Models Are Effective Temporal Learners	-
Tarsier (34B)	67.6	Tarsier: Recipes for Training and Evaluating Large Video Description Models	-
PPLLaVA (7b)	59.2	PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance	-
MiniGPT4	18.8	MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models	-
VideoChat	35.5	VideoChat: Chat-Centric Video Understanding	-
Oryx(34B)	64.7	Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution	-
InstructBLIP	32.5	InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning	-
LongVU (7B)	66.9	LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding	-
VideoLLaMA	34.1	Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding	-
LinVT-Qwen2-VL (7B)	69.3	LinVT: Empower Your Image-level Large Language Model to Understand Videos	-
HawkEye	47.55	HawkEye: Training Video-Text LLMs for Grounding Text in Videos	-
Video-ChatGPT	32.7	Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models	-
SPHINX-Plus	39.7	SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models	-
VideoLLaMA2 (72B)	62.0	VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs	-
VideoChat2	51.9	MVBench: A Comprehensive Multi-modal Video Understanding Benchmark	-
mPLUG-Owl3(7B)	59.5	mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models	-
PLLaVA	58.1	PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning	-
InternVideo2	67.2	InternVideo2: Scaling Foundation Models for Multimodal Video Understanding	-
VideoGPT+	58.7	VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding	-
LLaVa	36.0	Visual Instruction Tuning	-

0 of 21 row(s) selected.