HyperAI
HyperAI
الرئيسية
المنصة
الوثائق
الأخبار
الأوراق البحثية
الدروس
مجموعات البيانات
الموسوعة
SOTA
نماذج LLM
لوحة الأداء GPU
الفعاليات
البحث
حول
شروط الخدمة
سياسة الخصوصية
العربية
HyperAI
HyperAI
Toggle Sidebar
البحث في الموقع...
⌘
K
Command Palette
Search for a command to run...
المنصة
الرئيسية
SOTA
استرجاع الفيديو
Video Retrieval On Msvd
Video Retrieval On Msvd
المقاييس
text-to-video R@1
video-to-text R@1
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
Columns
اسم النموذج
text-to-video R@1
video-to-text R@1
Paper Title
InternVideo2-6B
61.4
85.2
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
HunYuan_tvr (huge)
59.0
73.0
Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions with Multi-Level Representations
InternVideo
58.4
76.3
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
HunYuan_tvr
58.2
69.1
Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions with Multi-Level Representations
vid-TLDR (UMT-L)
57.9
82.7
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
VLAB
57.5
-
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending
MDMMT-2
56.8
-
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization
Side4Video
56.1
-
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
CAMoE
51.8
69.3
Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss
Cap4Video
51.8
70.0
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
CenterCLIP (ViT-B/16)
50.6
68.4
CenterCLIP: Token Clustering for Efficient Text-Video Retrieval
X-CLIP
50.4
66.8
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval
DMAE (ViT-B/32)
48.7
-
Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning
QB-Norm+CLIP2Video
48.0
-
Cross Modal Retrieval with Querybank Normalisation
DiffusionRet+QB-Norm
47.9
60.3
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
PAU
47.3
68.9
Prototype-based Aleatoric Uncertainty Quantification for Cross-modal Retrieval
X-Pool
47.2
66.4
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
DiffusionRet
46.6
61.9
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
CLIP4Clip
46.2
62.0
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
LAFF
45.4
-
Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval
0 of 24 row(s) selected.
Previous
Next
Video Retrieval On Msvd | SOTA | HyperAI