HyperAI초신경

홈 뉴스 연구 논문 튜토리얼 데이터셋 백과사전 SOTA LLM 모델 GPU 랭킹 컨퍼런스

한국어

HyperAI초신경

Zero Shot Video Question Answer On Egoschema

평가 지표

Accuracy

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름	Accuracy	Paper Title	Repository
VideoChat2_HD_mistral	65.6	MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
MVU (13B)	60.3	Understanding Long Videos with Multimodal Language Models
Random	20.0	-	-
LangRepo (12B)	66.2	Language Repository for Long Video Understanding
LLoVi (7B)	50.8	A Simple LLM Framework for Long-Range Video Question-Answering
SlowFast-LLaVA-34B	47.2	SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
LLoVi (GPT-3.5)	57.6	A Simple LLM Framework for Long-Range Video Question-Answering
Tarsier (34B)	68.6	Tarsier: Recipes for Training and Evaluating Large Video Description Models
SeViLA (4B)	25.7	Self-Chained Image-Language Model for Video Localization and Question Answering
LVNet	66.0	Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
TS-LLaVA-34B	57.8	TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models
VideoTree (GPT4)	66.2	VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
VideoChat2_mistral	63.6	MVBench: A Comprehensive Multi-modal Video Understanding Benchmark

0 of 13 row(s) selected.