HyperAI

Video Retrieval On Didemo

المقاييس

text-to-video R@1
text-to-video R@10
text-to-video R@5

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

جدول المقارنة
اسم النموذجtext-to-video R@1text-to-video R@10text-to-video R@5
rtq-rethinking-video-language-understanding57.689.984.1
frozen-in-time-a-joint-video-and-image31.072.459.8
diffusionret-generative-text-video-retrieval48.983.375.5
hunyuan-tvr-for-text-video-retrivial52.785.277.8
advancing-high-resolution-video-language28.869.157.4
cross-modal-retrieval-with-querybank43.580.971.4
cap4video-what-can-auxiliary-captions-do-for52.087.579.4
revisiting-temporal-modeling-for-clip-based54.685.178.4
clip-vip-adapting-pre-trained-image-text55.389.382
align-and-prompt-video-and-language-pre35.978.867.5
disentangled-representation-learning-for-text49.084.576.5
vast-a-vision-audio-subtitle-text-omni-172.091.489.0
dual-modal-attention-enhanced-text-video52.786.679.3
mplug-2-a-modularized-multi-modal-foundation56.485.279.1
clover-towards-a-unified-video-language50.185.676.7
vindlu-a-recipe-for-effective-video-and61.291.085.8
use-what-you-have-video-retrieval-using16.154.441.1
multi-efficient-video-and-language56.587.080.2
unmasked-teacher-towards-training-efficient70.493.590.1
hitea-hierarchical-temporal-aware-video56.589.781.7
hunyuan-tvr-for-text-video-retrivial52.185.778.2
revealing-single-frame-bias-for-video-and53.986.979.4
an-empirical-study-of-end-to-end-video47.984.176.5
cosa-concatenated-sample-pretrained-vision70.5--
gramian-multimodal-representation-learning67.390.1-
x-clip-end-to-end-multi-grained-contrastive47.8-79.3
internvideo-general-video-foundation-models57.9--
testa-temporal-spatial-token-aggregation-for61.291.587.2
prototype-based-aleatoric-uncertainty-148.684.576.0
valor-vision-audio-language-omni-perception61.590.485.3
video-text-as-game-players-hierarchical46.982.774.9
vid-tldr-training-free-token-merging-for72.394.291.2
vlab-enhancing-video-language-pre-training-by56.888.781.6
rudder-a-cross-lingual-video-and-text16.356.5-
omnivl-one-foundation-model-for-image52.485.479.5
diffusionret-generative-text-video-retrieval46.782.774.7
improving-video-text-retrieval-by-multi43.879.971.4
النموذج 38-85.377.4
clip4clip-an-empirical-study-of-clip-for-end43.480.670.2
internvideo2-scaling-video-foundation-models74.2--