HyperAI
HyperAI
الرئيسية
المنصة
الوثائق
الأخبار
الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
شروط الخدمة
سياسة الخصوصية
العربية
HyperAI
HyperAI
Toggle Sidebar
البحث في الموقع...
⌘
K
Command Palette
Search for a command to run...
المنصة
الرئيسية
SOTA
القياس المعياري للمولدات القائمة على الفيديو (الفهم الزمني)
Video Based Generative Performance 5
Video Based Generative Performance 5
المقاييس
gpt-score
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
gpt-score
Paper Title
PPLLaVA-7B
3.21
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
ST-LLM
2.93
ST-LLM: Large Language Models Are Effective Temporal Learners
VideoGPT+
2.83
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
SlowFast-LLaVA-34B
2.77
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
TS-LLaVA-34B
2.77
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models
PLLaVA-34B
2.67
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
VideoChat2
2.66
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
MiniGPT4-video-7B
2.65
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
VideoChat2_HD_mistral
2.65
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
VTimeLLM
2.49
VTimeLLM: Empower LLM to Grasp Video Moments
Chat-UniVi
2.39
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
BT-Adapter
2.34
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
MovieChat
2.24
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
BT-Adapter (zero-shot)
2.13
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
LLaMA Adapter
1.98
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Video-ChatGPT
1.98
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Video Chat
1.94
VideoChat: Chat-Centric Video Understanding
Video LLaMA
1.82
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
0 of 18 row(s) selected.
Previous
Next
Video Based Generative Performance 5 | SOTA | HyperAI