HyperAI

Video Retrieval On Msr Vtt 1Ka

Métriques

text-to-video Median Rank
text-to-video R@1
text-to-video R@10
text-to-video R@5

Résultats

Résultats de performance de divers modèles sur ce benchmark

Tableau comparatif
Nom du modèletext-to-video Median Ranktext-to-video R@1text-to-video R@10text-to-video R@5
meltr-meta-loss-transformer-for-learning-to431.168.355.7
side4video-spatial-temporal-side-network-for1.052.384.275.5
clip2tv-an-empirical-study-on-transformer152.986.578.5
omnivec-learning-robust-representations-with--89.4-
hitea-hierarchical-temporal-aware-video-46.881.971.2
lightweight-attentional-feature-fusion-for-45.88271.5
holistic-features-are-almost-sufficient-for-48.083.575.9
cots-collaborative-two-stream-vision-language236.873.263.8
florence-a-new-foundation-model-for-computer-37.672.663.8
revealing-single-frame-bias-for-video-and-41.57768.7
x-clip-end-to-end-multi-grained-contrastive2.049.384.875.8
clip4clip-an-empirical-study-of-clip-for-end2-81.6-
taco-token-aware-cascade-contrastive-learning428.471.257.8
mdmmt-multidomain-multimodal-transformer-for238.979.769.0
rtq-rethinking-video-language-understanding-53.484.476.1
unified-coarse-to-fine-alignment-for-video-49.483.572.1
meltr-meta-loss-transformer-for-learning-to-41.382.573.5
pidro-parallel-isomeric-attention-with1.055.987.679.8
revisiting-temporal-modeling-for-clip-based154.187.879.5
frozen-in-time-a-joint-video-and-image331.070.559.5
prototype-based-aleatoric-uncertainty-12.048.582.572.7
cross-modal-retrieval-with-querybank247.283.073.0
ts2-net-token-shift-and-selection-transformer-54.087.479.3
expectation-maximization-contrastive-learning-46.883.173.1
meltr-meta-loss-transformer-for-learning-to335.578.467.2
centerclip-token-clustering-for-efficient248.482.073.8
bridgeformer-bridging-video-text-retrieval72656.446.4
video-text-as-game-players-hierarchical2.048.683.474.6
disentangled-representation-learning-for-text153.387.680.3
expectation-maximization-contrastive-learning-51.685.378.1
holistic-features-are-almost-sufficient-for-46.882.674.3
clip-vip-adapting-pre-trained-image-text1.057.788.280.5
a-straightforward-framework-for-video431.264.253.7
x-pool-cross-modal-language-video-attention246.982.272.8
vindlu-a-recipe-for-effective-video-and-46.580.471.5
clip2video-mastering-video-text-retrieval-via245.681.772.6
improving-video-text-retrieval-by-multi248.885.375.6
mplug-2-a-modularized-multi-modal-foundation-53.184.777.6
a-joint-sequence-fusion-model-for-video1310.243.231.2
use-what-you-have-video-retrieval-using620.962.448.8
diffusionret-generative-text-video-retrieval2.049.082.775.2
diffusionret-generative-text-video-retrieval2.048.983.175.2
dual-modal-attention-enhanced-text-video1.055.587.179.4
hunyuan-tvr-for-text-video-retrivial-55.0--
socratic-models-composing-zero-shot----
cap4video-what-can-auxiliary-captions-do-for151.483.975.7
towards-efficient-and-effective-text-to-video-54.186.978.8
bridgeformer-bridging-video-text-retrieval337.675.164.8
omnivec-learning-robust-representations-with--78.6-
vlm-task-agnostic-video-language-model-pre428.1067.4055.50
howto100m-learning-a-text-video-embedding-by914.952.840.2
x-2-vlm-all-in-one-pre-trained-model-for-49.684.276.7
hunyuan-tvr-for-text-video-retrivial1.062.990.884.5
x-2-vlm-all-in-one-pre-trained-model-for-47.684.274.1
masked-contrastive-pre-training-for-efficient338.973.963.1
multi-efficient-video-and-language-54.786.077.7
multi-modal-transformer-for-video-retrieval426.669.657.1
multi-modal-transformer-for-video-retrieval424.667.154.0
all-in-one-exploring-unified-video-language-37.977.168.1
videoclip-contrastive-pre-training-for-zero-30.966.855.4
howto100m-learning-a-text-video-embedding-by1212.148.035.0
clover-towards-a-unified-video-language240.579.469.8
video-text-retrieval-by-supervised-multi-49.883.975.1