HyperAI초신경

홈 뉴스 연구 논문 튜토리얼 데이터셋 백과사전 SOTA LLM 모델 GPU 랭킹 컨퍼런스

한국어

HyperAI초신경

Zero Shot Video Question Answer On Video Mme 1

평가 지표

Accuracy (%)

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름	Accuracy (%)	Paper Title	Repository
GPT-4o mini	68.9	GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding	-
VideoLLaMA2 (72B)	63.1	VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
BIMBA-LLaVA-Qwen2-7B	64.67	BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
Video-RAG (Based on LLaVA-Video)	77.4	Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
VILA-1.5 (34B)	64.1	VILA: On Pre-training for Visual Language Models
Gemini 1.5 Pro	81.3	Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
LongVU (7B)	60.6	LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
MiniCPM-V 2.6 (8B)	63.7	MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Gemini 1.5 Flash	75.0	Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
GPT-4o	77.2	GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding	-

0 of 10 row(s) selected.