HyperAI超神经

Video Retrieval On Vatex

评估指标

text-to-video R@1
text-to-video R@10
text-to-video R@5

评测结果

各个模型在此基准测试上的表现结果

比较表格
模型名称text-to-video R@1text-to-video R@10text-to-video R@5
vast-a-vision-audio-subtitle-text-omni-183.099.298.2
cross-modal-retrieval-with-querybank58.893.8-
clip2video-mastering-video-text-retrieval-via57.390-
side4video-spatial-temporal-side-network-for68.897.093.5
valor-vision-audio-language-omni-perception78.598.797.1
cap4video-what-can-auxiliary-captions-do-for66.697.093.1
internvideo2-scaling-video-foundation-models75.5--
gramian-multimodal-representation-learning87.7100-
ts2-net-token-shift-and-selection-transformer59.195.2-
lightweight-attentional-feature-fusion-for59.191.7-
unmasked-teacher-towards-training-efficient7297.895.1
internvideo-general-video-foundation-models71.1--
holistic-features-are-almost-sufficient-for63.696.191.9