HyperAI超神経

Video Retrieval On Vatex

評価指標

text-to-video R@1
text-to-video R@10
text-to-video R@5

評価結果

このベンチマークにおける各モデルのパフォーマンス結果

比較表
モデル名text-to-video R@1text-to-video R@10text-to-video R@5
vast-a-vision-audio-subtitle-text-omni-183.099.298.2
cross-modal-retrieval-with-querybank58.893.8-
clip2video-mastering-video-text-retrieval-via57.390-
side4video-spatial-temporal-side-network-for68.897.093.5
valor-vision-audio-language-omni-perception78.598.797.1
cap4video-what-can-auxiliary-captions-do-for66.697.093.1
internvideo2-scaling-video-foundation-models75.5--
gramian-multimodal-representation-learning87.7100-
ts2-net-token-shift-and-selection-transformer59.195.2-
lightweight-attentional-feature-fusion-for59.191.7-
unmasked-teacher-towards-training-efficient7297.895.1
internvideo-general-video-foundation-models71.1--
holistic-features-are-almost-sufficient-for63.696.191.9