HyperAI超神经
首页
资讯
最新论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
首页
SOTA
Zeroshot Video Question Answer
Zero Shot Video Question Answer On Video Mme 1
Zero Shot Video Question Answer On Video Mme 1
评估指标
Accuracy (%)
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Accuracy (%)
Paper Title
Repository
GPT-4o mini
68.9
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding
-
VideoLLaMA2 (72B)
63.1
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
BIMBA-LLaVA-Qwen2-7B
64.67
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
-
Video-RAG (Based on LLaVA-Video)
77.4
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
VILA-1.5 (34B)
64.1
VILA: On Pre-training for Visual Language Models
Gemini 1.5 Pro
81.3
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
LongVU (7B)
60.6
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
MiniCPM-V 2.6 (8B)
63.7
MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Gemini 1.5 Flash
75.0
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
GPT-4o
77.2
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding
-
0 of 10 row(s) selected.
Previous
Next