Zero Shot Video Question Answer On Intentqa

Métriques

Accuracy

Résultats

Résultats de performance de divers modèles sur ce benchmark

		Paper Title
ENTER	71.5	ENTER: Event Based Interpretable Reasoning for VideoQA
LVNet	71.1	Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
TS-LLaVA-34B	67.9	TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models
VidCtx (7B)	67.1	VidCtx: Context-aware Video Question Answering with Image Models
VideoTree (GPT4)	66.9	VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
IG-VLM	65.3	An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM
LLoVi (GPT-4)	64.0	A Simple LLM Framework for Long-Range Video Question-Answering
SeViLA (4B)	60.9	Self-Chained Image-Language Model for Video Localization and Question Answering
SlowFast-LLaVA-34B	60.1	SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
LangRepo (12B)	59.1	Language Repository for Long Video Understanding
LLoVi (7B)	53.6	A Simple LLM Framework for Long-Range Video Question-Answering
Mistral (7B)	50.4	Mistral 7B
Random	20.0	-

0 of 13 row(s) selected.

Zero Shot Video Question Answer On Intentqa | SOTA | HyperAI