HyperAI
الرئيسية
الأخبار
أحدث الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
العربية
HyperAI
Toggle sidebar
البحث في الموقع...
⌘
K
الرئيسية
SOTA
Zero Shot Video Retrieval
Zero Shot Video Retrieval On Lsmdc
Zero Shot Video Retrieval On Lsmdc
المقاييس
text-to-video R@1
text-to-video R@10
text-to-video R@5
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
text-to-video R@1
text-to-video R@10
text-to-video R@5
Paper Title
Repository
BT-Adapter
19.5
45.0
35.9
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
HowToCaption
17.3
38.6
31.7
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
HiTeA-17M
18.3
44.2
36.7
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
-
Y. Ge et. al.
12.2
32.2
25.9
Bridging Video-text Retrieval with Multiple Choice Questions
SSML
4.2
17.1
11.6
Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning
InternVideo2-6B
33.8
62.2
55.9
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
CLIP4Clip
15.1
36.4
28.5
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
UMT-L (ViT-L/16)
25.2
50.5
43.0
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
HiTeA-5M
15.5
39.8
31.1
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
-
mPLUG-2
24.1
52.0
43.8
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
MILES
11.1
30.6
24.7
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval
InternVideo2-1B
32.0
59.4
52.4
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
InternVideo
17.6
40.2
32.4
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Clover
14.7
38.2
29.2
Clover: Towards A Unified Video-Language Alignment and Fusion Model
Yatai Ji et. al.
17.2
39.1
32.4
Seeing What You Miss: Vision-Language Pre-training with Semantic Completion Learning
VAST, HowToCaption-finetuned
27.7
54.6
46.5
HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
0 of 16 row(s) selected.
Previous
Next