HyperAI超神经

首页资讯论文教程数据集百科 SOTA LLM 模型天梯 GPU 天梯顶会

中文

HyperAI超神经

Zero Shot Video Question Answer On Intentqa

评估指标

Accuracy

评测结果

各个模型在此基准测试上的表现结果

模型名称	Accuracy	Paper Title	Repository
IG-VLM	65.3	An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM
VideoTree (GPT4)	66.9	VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
VidCtx (7B)	67.1	VidCtx: Context-aware Video Question Answering with Image Models
LLoVi (GPT-4)	64.0	A Simple LLM Framework for Long-Range Video Question-Answering
LangRepo (12B)	59.1	Language Repository for Long Video Understanding
SeViLA (4B)	60.9	Self-Chained Image-Language Model for Video Localization and Question Answering
LVNet	71.1	Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
ENTER	71.5	ENTER: Event Based Interpretable Reasoning for VideoQA	-
LLoVi (7B)	53.6	A Simple LLM Framework for Long-Range Video Question-Answering
Mistral (7B)	50.4	Mistral 7B
TS-LLaVA-34B	67.9	TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models
SlowFast-LLaVA-34B	60.1	SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Random	20.0	-	-

0 of 13 row(s) selected.