HyperAI超神经
首页
资讯
最新论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
首页
SOTA
Video Retrieval
Video Retrieval On Youcook2
Video Retrieval On Youcook2
评估指标
text-to-video Median Rank
text-to-video R@1
text-to-video R@10
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
text-to-video Median Rank
text-to-video R@1
text-to-video R@10
Paper Title
Repository
COOT
9
16.7
52.3
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Text-Video Embedding
24
8.2
35.3
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
HGLMM FV CCA
75
4.6
21.6
Associating Neural Word Embeddings With Deep Image Representations Using Fisher Vectors
-
Satar et al.
77
5.3
20.8
Semantic Role Aware Correlation Transformer for Text to Video Retrieval
RoME
53
6.3
25.2
RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video Retrieval
VideoCLIP
-
32.2
75.0
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
VLM
4
27.05
69.38
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
TACo
4
29.6
72.7
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
-
VAST
-
50.4
80.8
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
OmniVec (pretrained)
-
-
64.2
OmniVec: Learning robust representations with cross modal sharing
-
UniVL + MELTR
3
33.7
74.8
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
OmniVec
-
-
70.8
OmniVec: Learning robust representations with cross modal sharing
-
UniVL
4
28.9
70.0
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
VideoCLIP (zero-shot)
-
22.7
63.1
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
VideoCoCa (zero-shot)
-
21.7
55.2
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
-
MDMMT-2
3.0
32.0
74.8
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization
-
0 of 16 row(s) selected.
Previous
Next