HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
服务条款
隐私政策
中文
HyperAI
HyperAI超神经
Toggle Sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
算力平台
首页
SOTA
视频检索
Video Retrieval On Youcook2
Video Retrieval On Youcook2
评估指标
text-to-video Median Rank
text-to-video R@1
text-to-video R@10
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
text-to-video Median Rank
text-to-video R@1
text-to-video R@10
Paper Title
VAST
-
50.4
80.8
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
VideoCLIP
-
32.2
75.0
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
UniVL + MELTR
3
33.7
74.8
MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
MDMMT-2
3.0
32.0
74.8
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization
TACo
4
29.6
72.7
TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment
OmniVec
-
-
70.8
OmniVec: Learning robust representations with cross modal sharing
UniVL
4
28.9
70.0
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
VLM
4
27.05
69.38
VLM: Task-agnostic Video-Language Model Pre-training for Video Understanding
OmniVec (pretrained)
-
-
64.2
OmniVec: Learning robust representations with cross modal sharing
VideoCLIP (zero-shot)
-
22.7
63.1
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
VideoCoCa (zero-shot)
-
21.7
55.2
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
COOT
9
16.7
52.3
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Text-Video Embedding
24
8.2
35.3
HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
RoME
53
6.3
25.2
RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video Retrieval
HGLMM FV CCA
75
4.6
21.6
Associating Neural Word Embeddings With Deep Image Representations Using Fisher Vectors
Satar et al.
77
5.3
20.8
Semantic Role Aware Correlation Transformer for Text to Video Retrieval
0 of 16 row(s) selected.
Previous
Next