HyperAI

Video Retrieval On Activitynet

Metriken

text-to-video Median Rank
text-to-video R@1
text-to-video R@5
text-to-video R@50

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Vergleichstabelle
Modellnametext-to-video Median Ranktext-to-video R@1text-to-video R@5text-to-video R@50
advancing-high-resolution-video-language428.557.494
revealing-single-frame-bias-for-video-and-47.175.5-
rtq-rethinking-video-language-understanding-53.581.4-
multi-modal-transformer-for-video-retrieval3.328.761.494.5
video-and-text-matching-with-conditioned-25.459.1-
improving-video-text-retrieval-by-multi151.077.7-
internvideo-general-video-foundation-models-62.2--
diffusionret-generative-text-video-retrieval2.045.875.6-
x-clip-end-to-end-multi-grained-contrastive-46.275.5-
hitea-hierarchical-temporal-aware-video-49.777.1-
valor-vision-audio-language-omni-perception-70.190.8-
clip4clip-an-empirical-study-of-clip-for-end240.573.498.2
video-text-as-game-players-hierarchical2.042.273.0-
expectation-maximization-contrastive-learning-41.272.7-
expectation-maximization-contrastive-learning-50.678.798.1
cosa-concatenated-sample-pretrained-vision-67.3--
diffusionret-generative-text-video-retrieval2.048.1--
multi-modal-transformer-for-video-retrieval522.754.293.2
clip-vip-adapting-pre-trained-image-text161.485.7-
dual-modal-attention-enhanced-text-video1.053.480.7-
hunyuan-tvr-for-text-video-retrivial157.384.8-
vid-tldr-training-free-token-merging-for-66.788.6-
unmasked-teacher-towards-training-efficient-66.889.1-
vindlu-a-recipe-for-effective-video-and-55.0 81.4-
testa-temporal-spatial-token-aggregation-for-54.880.8-
centerclip-token-clustering-for-efficient246.277.0-
gramian-multimodal-representation-learning-69.9--
taco-token-aware-cascade-contrastive-learning3.030.461.293.4
internvideo2-scaling-video-foundation-models-74.1--
vast-a-vision-audio-subtitle-text-omni-1-70.590.9-
use-what-you-have-video-retrieval-using620.547.791.4