HyperAI
الرئيسية
الأخبار
أحدث الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
العربية
HyperAI
Toggle sidebar
البحث في الموقع...
⌘
K
الرئيسية
SOTA
Video Retrieval
Video Retrieval On Lsmdc
Video Retrieval On Lsmdc
المقاييس
text-to-video Mean Rank
text-to-video R@1
text-to-video R@10
text-to-video R@5
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
text-to-video Mean Rank
text-to-video R@1
text-to-video R@10
text-to-video R@5
Paper Title
Repository
CAMoE
54.4
25.9
53.7
46.1
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss
HD-VILA
-
17.4
44.1
34.1
Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions
CLIP4Clip
58.0
21.6
49.8
41.8
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Collaborative Experts
-
11.2
34.8
26.9
Use What You Have: Video Retrieval Using Representations From Collaborative Experts
MDMMT
58.0
18.8
47.9
38.5
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
EMCL-Net
-
23.9
50.9
42.4
Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
VALOR
-
34.2
64.1
56.0
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
X-Pool
53.2
25.2
53.5
43.7
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)
8
-
53.7
-
Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations
HunYuan_tvr (huge)
3.9
40.4
92.8
80.1
Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions with Multi-Level Representations
-
InternVideo
-
34.0
-
-
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
MDMMT-2
48.0
26.9
55.9
46.7
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization
-
CLIP
-
11.3
29.2
22.7
A Straightforward Framework For Video Retrieval Using CLIP
X-CLIP
-
26.1
-
-
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval
HiTeA
-
28.7
59.0
50.3
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
-
MoEE
-
10.1
34.6
25.6
Learning a Text-Video Embedding from Incomplete and Heterogeneous Data
MMT-Pretrained
-
13.5
40.1
29.9
Multi-modal Transformer for Video Retrieval
QB-Norm+CLIP4Clip
-
22.4
49.5
40.1
Cross Modal Retrieval with Querybank Normalisation
CenterCLIP (ViT-B/16)
47.3
24.2
55.9
46.2
CenterCLIP: Token Clustering for Efficient Text-Video Retrieval
VIOLETv2
-
24
54.1
43.5
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
0 of 38 row(s) selected.
Previous
Next