HyperAIHyperAI초신경
홈뉴스연구 논문튜토리얼데이터셋백과사전SOTALLM 모델GPU 랭킹컨퍼런스
전체 검색
소개
한국어
HyperAIHyperAI초신경
  1. 홈
  2. SOTA
  3. 제로샷 비디오 질문 답변
  4. Zero Shot Video Question Answer On Video Mme 1

Zero Shot Video Question Answer On Video Mme 1

평가 지표

Accuracy (%)

평가 결과

이 벤치마크에서 각 모델의 성능 결과

모델 이름
Accuracy (%)
Paper TitleRepository
GPT-4o mini68.9GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding-
VideoLLaMA2 (72B)63.1VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
BIMBA-LLaVA-Qwen2-7B64.67BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
Video-RAG (Based on LLaVA-Video)77.4Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
VILA-1.5 (34B)64.1VILA: On Pre-training for Visual Language Models
Gemini 1.5 Pro81.3Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
LongVU (7B)60.6LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
MiniCPM-V 2.6 (8B)63.7MiniCPM-V: A GPT-4V Level MLLM on Your Phone
Gemini 1.5 Flash75.0Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
GPT-4o77.2GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding-
0 of 10 row(s) selected.
HyperAI

학습, 이해, 실천, 커뮤니티와 함께 인공지능의 미래를 구축하다

한국어

소개

회사 소개데이터셋 도움말

제품

뉴스튜토리얼데이터셋백과사전

링크

TVM 한국어Apache TVMOpenBayes

© HyperAI초신경

TwitterBilibili