HyperAI
HyperAI
الرئيسية
المنصة
الوثائق
الأخبار
الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
شروط الخدمة
سياسة الخصوصية
العربية
HyperAI
HyperAI
Toggle Sidebar
البحث في الموقع...
⌘
K
Command Palette
Search for a command to run...
المنصة
الرئيسية
SOTA
استرجاع الفيديو بدون تدريب مسبق
Zero Shot Video Retrieval On Lsmdc
Zero Shot Video Retrieval On Lsmdc
المقاييس
text-to-video R@1
text-to-video R@10
text-to-video R@5
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
text-to-video R@1
text-to-video R@10
text-to-video R@5
Paper Title
InternVideo2-6B
33.8
62.2
55.9
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
InternVideo2-1B
32.0
59.4
52.4
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
VAST, HowToCaption-finetuned
27.7
54.6
46.5
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
UMT-L (ViT-L/16)
25.2
50.5
43.0
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
mPLUG-2
24.1
52.0
43.8
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
BT-Adapter
19.5
45.0
35.9
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
HiTeA-17M
18.3
44.2
36.7
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
InternVideo
17.6
40.2
32.4
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
HowToCaption
17.3
38.6
31.7
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Yatai Ji et. al.
17.2
39.1
32.4
-
HiTeA-5M
15.5
39.8
31.1
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
CLIP4Clip
15.1
36.4
28.5
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Clover
14.7
38.2
29.2
Clover: Towards A Unified Video-Language Alignment and Fusion Model
Y. Ge et. al.
12.2
32.2
25.9
Bridging Video-text Retrieval with Multiple Choice Questions
MILES
11.1
30.6
24.7
-
SSML
4.2
17.1
11.6
Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning
0 of 16 row(s) selected.
Previous
Next
Zero Shot Video Retrieval On Lsmdc | SOTA | HyperAI