HyperAI超神経

ホームニュース論文チュートリアルデータセット百科事典 SOTA LLMモデル GPU ランキング学会

サイトについて

日本語

HyperAI超神経

Zero Shot Video Question Answer On Video Mme 1

評価指標

Accuracy (%)

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

モデル名	Accuracy (%)	Paper Title	Repository
GPT-4o mini	68.9	GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding	-
VideoLLaMA2 (72B)	63.1	VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
BIMBA-LLaVA-Qwen2-7B	64.67	BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
Video-RAG (Based on LLaVA-Video)	77.4	Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
VILA-1.5 (34B)	64.1	VILA: On Pre-training for Visual Language Models
Gemini 1.5 Pro	81.3	Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
LongVU (7B)	60.6	LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
MiniCPM-V 2.6 (8B)	63.7	MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Gemini 1.5 Flash	75.0	Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
GPT-4o	77.2	GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding	-

0 of 10 row(s) selected.