Zero Shot Video Retrieval On Msvd
Metriken
text-to-video R@1
text-to-video R@10
text-to-video R@5
video-to-text R@1
video-to-text R@10
video-to-text R@5
Ergebnisse
Leistungsergebnisse verschiedener Modelle zu diesem Benchmark
Vergleichstabelle
Modellname | text-to-video R@1 | text-to-video R@10 | text-to-video R@5 | video-to-text R@1 | video-to-text R@10 | video-to-text R@5 |
---|---|---|---|---|---|---|
internvideo2-scaling-video-foundation-models | 58.1 | 88.4 | 83.0 | 83.3 | 96.9 | 94.3 |
vid-tldr-training-free-token-merging-for | 50.0 | 85.5 | 77.6 | 75.7 | 95.1 | 90.0 |
clip4clip-an-empirical-study-of-clip-for-end | 38.5 | 76.8 | 66.9 | - | - | - |
noise-estimation-using-density-estimation-for | 13.66 | 47.74 | 35.7 | - | - | - |
miles-visual-bert-pre-training-with-injected | 44.4 | 87.0 | 76.2 | - | - | - |
howtocaption-prompting-llms-to-transform | 44.5 | 82.1 | 73.3 | - | - | - |
bridgeformer-bridging-video-text-retrieval | 43.6 | 84.9 | 74.9 | - | - | - |
languagebind-extending-video-language | 53.9 | 87.8 | 80.4 | 72.0 | 96.3 | 91.4 |
howtocaption-prompting-llms-to-transform | 54.8 | 87.2 | 80.9 | - | - | - |
internvideo2-scaling-video-foundation-models | 59.3 | 89.6 | 84.4 | 83.1 | 97.0 | 94.2 |
unmasked-teacher-towards-training-efficient | 49.0 | 84.7 | 76.9 | 74.5 | 92.8 | 89.7 |
lat-latent-translation-with-cycle-consistency | 36.9 | 81.0 | 68.6 | 34.4 | 79.2 | 69.0 |
languagebind-extending-video-language | 54.1 | 88.1 | 81.1 | 69.7 | 97.9 | 91.8 |
internvideo-general-video-foundation-models | 43.4 | - | - | 67.6 | - | - |