HyperAI超神経

ホームニュース論文チュートリアルデータセット百科事典 SOTA LLMモデル GPU ランキング学会

サイトについて

日本語

HyperAI超神経

Zero Shot Video Question Answer On Intentqa

評価指標

Accuracy

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

モデル名	Accuracy	Paper Title	Repository
IG-VLM	65.3	An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM
VideoTree (GPT4)	66.9	VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
VidCtx (7B)	67.1	VidCtx: Context-aware Video Question Answering with Image Models
LLoVi (GPT-4)	64.0	A Simple LLM Framework for Long-Range Video Question-Answering
LangRepo (12B)	59.1	Language Repository for Long Video Understanding
SeViLA (4B)	60.9	Self-Chained Image-Language Model for Video Localization and Question Answering
LVNet	71.1	Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
ENTER	71.5	ENTER: Event Based Interpretable Reasoning for VideoQA	-
LLoVi (7B)	53.6	A Simple LLM Framework for Long-Range Video Question-Answering
Mistral (7B)	50.4	Mistral 7B
TS-LLaVA-34B	67.9	TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models
SlowFast-LLaVA-34B	60.1	SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Random	20.0	-	-

0 of 13 row(s) selected.