HyperAI초신경

홈 뉴스 연구 논문 튜토리얼 데이터셋 백과사전 SOTA LLM 모델 GPU 랭킹 컨퍼런스

한국어

HyperAI초신경

Vcgbench Diverse On Videoinstruct

평가 지표

Consistency

Contextual Understanding

Correctness of Information

Dense Captioning

Detail Orientation

Reasoning

Spatial Understanding

Temporal Understanding

mean

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름	Consistency	Contextual Understanding	Correctness of Information	Dense Captioning	Detail Orientation	Reasoning	Spatial Understanding	Temporal Understanding	mean	Paper Title	Repository
BT-Adapter	2.27	2.59	2.20	1.03	2.62	3.62	2.35	1.29	2.19	BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
VideoGPT+	2.59	2.81	2.46	1.38	2.73	3.63	2.80	1.78	2.47	VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Chat-UniVi	2.36	2.66	2.29	1.33	2.56	3.59	2.36	1.56	2.29	Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
VideoChat2	2.27	2.51	2.13	1.26	2.42	3.13	2.43	1.66	2.20	MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
VTimeLLM	2.35	2.48	2.16	1.13	2.41	3.45	2.29	1.46	2.17	VTimeLLM: Empower LLM to Grasp Video Moments
Video-ChatGPT	2.06	2.46	2.07	0.89	2.42	3.60	2.25	1.39	2.08	Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

0 of 6 row(s) selected.