HyperAI

Video Retrieval On Msr Vtt

Metriken

text-to-video R@1
text-to-video R@10
text-to-video R@5

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Vergleichstabelle
Modellnametext-to-video R@1text-to-video R@10text-to-video R@5
audio-enhanced-text-to-video-retrieval-using5286.176.6
video-text-modeling-with-zero-shot-transfer34.367.057.8
taco-token-aware-cascade-contrastive-learning24.864.052.1
an-empirical-study-of-end-to-end-video37.275.864.8
cosa-concatenated-sample-pretrained-vision57.9--
coca-contrastive-captioners-are-image-text30.061.652.4
a-straightforward-framework-for-video21.450.441.1
rome-role-aware-mixture-of-expert-transformer10.741.229.6
internvideo2-scaling-video-foundation-models62.8--
learning-language-visual-embedding-for-movie4.219.9-
valor-vision-audio-language-omni-perception59.989.683.5
gramian-multimodal-representation-learning6489.3-
Modell 1352.48273.9
temporal-tessellation-a-unified-approach-for4.724.1-
video-and-text-matching-with-conditioned26-56.7
howto100m-learning-a-text-video-embedding-by14.952.8-
frozen-in-time-a-joint-video-and-image32.571.261.5
lightweight-attentional-feature-fusion-for29.165.854.9
meltr-meta-loss-transformer-for-learning-to38.684.774.4
a-joint-sequence-fusion-model-for-video10.243.2-
cots-collaborative-two-stream-vision-language32.170.260.8
unified-coarse-to-fine-alignment-for-video49.483.572.1
vid-tldr-training-free-token-merging-for58.181.681.0
clip2tv-an-empirical-study-on-transformer33.168.958.9
univilm-a-unified-video-and-language-pre21.263.149.6
improving-video-text-retrieval-by-multi32.968.458.3
use-what-you-have-video-retrieval-using10.041.229.0
advancing-high-resolution-video-language35.67865.3
meltr-meta-loss-transformer-for-learning-to33.677.863.7
omnivl-one-foundation-model-for-image47.883.874.2
vlab-enhancing-video-language-pre-training-by55.187.678.8
mdmmt-multidomain-multimodal-transformer-for23.161.849.8
mdmmt-2-multidomain-multimodal-transformer33.770.860.5
learning-joint-embedding-with-multimodal-cues7.029.720.9
meltr-meta-loss-transformer-for-learning-to28.567.655.5
internvideo-general-video-foundation-models55.2--
vast-a-vision-audio-subtitle-text-omni-163.989.684.3
unmasked-teacher-towards-training-efficient58.887.181.0
clip2video-mastering-video-text-retrieval-via29.866.255.5
clip4clip-an-empirical-study-of-clip-for-end44.581.671.4