HyperAI超神经
首页
资讯
最新论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
首页
SOTA
Zero Shot Video Retrieval
Zero Shot Video Retrieval On Activitynet
Zero Shot Video Retrieval On Activitynet
评估指标
text-to-video R@1
video-to-text R@1
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
text-to-video R@1
video-to-text R@1
Paper Title
Repository
InternVideo
30.7
31.4
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
VideoCoCa
34.5
33.0
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
-
vid-TLDR (UMT-L)
42.8
41.2
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
GRAM
59.0
50.9
Gramian Multimodal Representation Learning and Alignment
InternVideo2-1B
60.4
54.8
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
LanguageBind(ViT-L/14)
38.4
35.7
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
UMT-L (ViT-L/16)
42.8
40.7
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Singularity-temporal-17M
30.6
-
Revealing Single Frame Bias for Video-and-Language Learning
LanguageBind(ViT-H/14)
41.0
39.1
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
InternVideo2-6B
63.2
56.5
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
BT-Adapter
37.0
-
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
Singularity-temporal-5M
30.8
-
Revealing Single Frame Bias for Video-and-Language Learning
0 of 12 row(s) selected.
Previous
Next