HyperAI초신경

홈 뉴스 연구 논문 튜토리얼 데이터셋 백과사전 SOTA LLM 모델 GPU 랭킹 컨퍼런스

한국어

HyperAI초신경

Zeroshot Video Question Answer On Tgif Qa

평가 지표

Accuracy

Confidence Score

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름	Accuracy	Confidence Score	Paper Title	Repository
Elysium	66.6	3.6	Elysium: Exploring Object-level Perception in Videos via MLLM
MiniGPT4-video-7B	72.22	-	MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
TS-LLaVA-34B	81.0	4.2	TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models
Video Chat-7B	34.4	2.3	VideoChat: Chat-Centric Video Understanding
SlowFast-LLaVA-34B	80.6	4.3	SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Video-LLaVA-7B	70.0	4.0	Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
VideoGPT+	74.6	4.1	VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
LinVT-Qwen2-VL (7B)	81.3	4.3	LinVT: Empower Your Image-level Large Language Model to Understand Videos
IG-VLM	79.1	4.2	An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM
Video-ChatGPT-7B	51.4	3.0	Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
PLLaVA	80.6	4.3	PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Chat-UniVi-7B	69.0	3.8	Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
FrozenBiLM	41.9	-	Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Tarsier (34B)	82.5	4.4	Tarsier: Recipes for Training and Evaluating Large Video Description Models

0 of 14 row(s) selected.