HyperAI초신경

홈 뉴스 연구 논문 튜토리얼 데이터셋 백과사전 SOTA LLM 모델 GPU 랭킹 컨퍼런스

한국어

HyperAI초신경

Zero Shot Video Question Answer On Intentqa

평가 지표

Accuracy

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름	Accuracy	Paper Title	Repository
IG-VLM	65.3	An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM
VideoTree (GPT4)	66.9	VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
VidCtx (7B)	67.1	VidCtx: Context-aware Video Question Answering with Image Models
LLoVi (GPT-4)	64.0	A Simple LLM Framework for Long-Range Video Question-Answering
LangRepo (12B)	59.1	Language Repository for Long Video Understanding
SeViLA (4B)	60.9	Self-Chained Image-Language Model for Video Localization and Question Answering
LVNet	71.1	Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
ENTER	71.5	ENTER: Event Based Interpretable Reasoning for VideoQA	-
LLoVi (7B)	53.6	A Simple LLM Framework for Long-Range Video Question-Answering
Mistral (7B)	50.4	Mistral 7B
TS-LLaVA-34B	67.9	TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models
SlowFast-LLaVA-34B	60.1	SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Random	20.0	-	-

0 of 13 row(s) selected.