HyperAI
Home
News
Latest Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
English
HyperAI
Toggle sidebar
Search the site…
⌘
K
Home
SOTA
Zeroshot Video Question Answer
Zero Shot Video Question Answer On Video Mme 1
Zero Shot Video Question Answer On Video Mme 1
Metrics
Accuracy (%)
Results
Performance results of various models on this benchmark
Columns
Model Name
Accuracy (%)
Paper Title
Repository
GPT-4o mini
68.9
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding
-
VideoLLaMA2 (72B)
63.1
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
BIMBA-LLaVA-Qwen2-7B
64.67
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
-
Video-RAG (Based on LLaVA-Video)
77.4
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
VILA-1.5 (34B)
64.1
VILA: On Pre-training for Visual Language Models
Gemini 1.5 Pro
81.3
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
LongVU (7B)
60.6
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
MiniCPM-V 2.6 (8B)
63.7
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Gemini 1.5 Flash
75.0
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
GPT-4o
77.2
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding
-
0 of 10 row(s) selected.
Previous
Next