HyperAI
HyperAI
الرئيسية
المنصة
الوثائق
الأخبار
الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
شروط الخدمة
سياسة الخصوصية
العربية
HyperAI
HyperAI
Toggle Sidebar
البحث في الموقع...
⌘
K
Command Palette
Search for a command to run...
المنصة
الرئيسية
SOTA
استرجاع الفيديو بدون تدريب مسبق
Zero Shot Video Retrieval On Activitynet
Zero Shot Video Retrieval On Activitynet
المقاييس
text-to-video R@1
video-to-text R@1
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
text-to-video R@1
video-to-text R@1
Paper Title
InternVideo2-6B
63.2
56.5
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
InternVideo2-1B
60.4
54.8
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
GRAM
59.0
50.9
Gramian Multimodal Representation Learning and Alignment
vid-TLDR (UMT-L)
42.8
41.2
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
UMT-L (ViT-L/16)
42.8
40.7
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
LanguageBind(ViT-H/14)
41.0
39.1
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
LanguageBind(ViT-L/14)
38.4
35.7
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
BT-Adapter
37.0
-
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
VideoCoCa
34.5
33.0
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
Singularity-temporal-5M
30.8
-
Revealing Single Frame Bias for Video-and-Language Learning
InternVideo
30.7
31.4
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Singularity-temporal-17M
30.6
-
Revealing Single Frame Bias for Video-and-Language Learning
0 of 12 row(s) selected.
Previous
Next