HyperAI초신경
홈
뉴스
최신 연구 논문
튜토리얼
데이터셋
백과사전
SOTA
LLM 모델
GPU 랭킹
컨퍼런스
전체 검색
소개
한국어
HyperAI초신경
Toggle sidebar
전체 사이트 검색...
⌘
K
홈
SOTA
Zeroshot Video Question Answer
Zero Shot Video Question Answer On Egoschema 1
Zero Shot Video Question Answer On Egoschema 1
평가 지표
Accuracy
평가 결과
이 벤치마크에서 각 모델의 성능 결과
Columns
모델 이름
Accuracy
Paper Title
Repository
TimeChat (7B)
33.0
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
VideoChat2_mistral
54.4
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Tarsier (34B)
61.7
Tarsier: Recipes for Training and Evaluating Large Video Description Models
LLoVi (GPT-3.5)
50.3
A Simple LLM Framework for Long-Range Video Question-Answering
InternVideo
32.1
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
VideoTree (GPT4)
61.1
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Vamos (GPT-4o)
53.6
Vamos: Versatile Action Models for Video Understanding
-
MVU (13B)
37.6
Understanding Long Videos with Multimodal Language Models
SeViLA (4B)
22.7
Self-Chained Image-Language Model for Video Localization and Question Answering
LVNet
61.1
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
BIMBA-LLaVA-Qwen2-7B
71.14
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
-
Random
20.0
-
-
Vamos (GPT-4)
48.3
Vamos: Versatile Action Models for Video Understanding
-
VideoChat2_HD_mistral
55.8
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
VideoLLaMA2 (72B)
63.9
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Video-RAG (Based on LLaVA-Video)
66.7
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
mPLUG-Owl (7B)
31.1
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
LinVT-Qwen2-VL(7B)
69.5
LinVT: Empower Your Image-level Large Language Model to Understand Videos
TraveLER
53.3
TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering
VideoChat2_phi3
56.7
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
0 of 27 row(s) selected.
Previous
Next