HyperAI
HyperAI
الرئيسية
المنصة
الوثائق
الأخبار
الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
شروط الخدمة
سياسة الخصوصية
العربية
HyperAI
HyperAI
Toggle Sidebar
البحث في الموقع...
⌘
K
Command Palette
Search for a command to run...
المنصة
الرئيسية
SOTA
القياس المولد القائم على الفيديو (التوجه التفصيلي)
Video Based Generative Performance 4
Video Based Generative Performance 4
المقاييس
gpt-score
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
gpt-score
Paper Title
PPLLaVA-7B
3.56
PPLLaVA: Varied Video Sequence Understanding With Prompt Guidance
PLLaVA-34B
3.20
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
VideoGPT+
3.18
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
VTimeLLM
3.10
VTimeLLM: Empower LLM to Grasp Video Moments
ST-LLM
3.05
ST-LLM: Large Language Models Are Effective Temporal Learners
TS-LLaVA-34B
3.03
TS-LLaVA: Constructing Visual Tokens through Thumbnail-and-Sampling for Training-Free Video Large Language Models
MiniGPT4-video-7B
3.02
MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens
SlowFast-LLaVA-34B
2.96
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
MovieChat
2.93
MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Chat-UniVi
2.91
Chat-UniVi: Unified Visual Representation Empowers Large Language Models with Image and Video Understanding
VideoChat2
2.88
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
VideoChat2_HD_mistral
2.86
MVBench: A Comprehensive Multi-modal Video Understanding Benchmark
BT-Adapter
2.69
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
Video-ChatGPT
2.52
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models
Video Chat
2.50
VideoChat: Chat-Centric Video Understanding
BT-Adapter (zero-shot)
2.46
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
LLaMA Adapter
2.32
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Video LLaMA
2.18
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
0 of 18 row(s) selected.
Previous
Next
Video Based Generative Performance 4 | SOTA | HyperAI