HyperAI
HyperAI
الرئيسية
المنصة
الوثائق
الأخبار
الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
شروط الخدمة
سياسة الخصوصية
العربية
HyperAI
HyperAI
Toggle Sidebar
البحث في الموقع...
⌘
K
Command Palette
Search for a command to run...
المنصة
الرئيسية
SOTA
الأسئلة والإجابات على الفيديو بدون تدريب مسبق
Zero Shot Video Question Answer On Egoschema 1
Zero Shot Video Question Answer On Egoschema 1
المقاييس
Accuracy
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
Accuracy
Paper Title
BIMBA-LLaVA-Qwen2-7B
71.14
BIMBA: Selective-Scan Compression for Long-Range Video Question Answering
LinVT-Qwen2-VL(7B)
69.5
LinVT: Empower Your Image-level Large Language Model to Understand Videos
LongVU (7B)
67.6
LongVU: Spatiotemporal Adaptive Compression for Long Video-Language Understanding
Video-RAG (Based on LLaVA-Video)
66.7
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
VideoLLaMA2 (72B)
63.9
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Tarsier (34B)
61.7
Tarsier: Recipes for Training and Evaluating Large Video Description Models
VideoTree (GPT4)
61.1
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
LVNet
61.1
Too Many Frames, Not All Useful: Efficient Strategies for Long-Form Video QA
InternVideo2-6B
60.2
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
VideoChat2_phi3
56.7
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
VideoChat2_HD_mistral
55.8
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
VideoChat2_mistral
54.4
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
Vamos (GPT-4o)
53.6
Vamos: Versatile Action Models for Video Understanding
TraveLER
53.3
TraveLER: A Modular Multi-LMM Agent Framework for Video Question-Answering
LLoVi (GPT-3.5)
50.3
A Simple LLM Framework for Long-Range Video Question-Answering
Video ReCap
50.23
Video ReCap: Recursive Captioning of Hour-Long Videos
Vamos (GPT-4)
48.3
Vamos: Versatile Action Models for Video Understanding
LangRepo (12B)
41.2
Language Repository for Long Video Understanding
MVU (13B)
37.6
Understanding Long Videos with Multimodal Language Models
Vamos (13B)
36.7
Vamos: Versatile Action Models for Video Understanding
0 of 27 row(s) selected.
Previous
Next
Zero Shot Video Question Answer On Egoschema 1 | SOTA | HyperAI