Zero Shot Video Retrieval On Lsmdc
المقاييس
text-to-video R@1
text-to-video R@10
text-to-video R@5
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
جدول المقارنة
اسم النموذج | text-to-video R@1 | text-to-video R@10 | text-to-video R@5 |
---|---|---|---|
one-for-all-video-conversation-is-feasible | 19.5 | 45.0 | 35.9 |
howtocaption-prompting-llms-to-transform | 17.3 | 38.6 | 31.7 |
hitea-hierarchical-temporal-aware-video | 18.3 | 44.2 | 36.7 |
bridgeformer-bridging-video-text-retrieval | 12.2 | 32.2 | 25.9 |
noise-estimation-using-density-estimation-for | 4.2 | 17.1 | 11.6 |
internvideo2-scaling-video-foundation-models | 33.8 | 62.2 | 55.9 |
clip4clip-an-empirical-study-of-clip-for-end | 15.1 | 36.4 | 28.5 |
unmasked-teacher-towards-training-efficient | 25.2 | 50.5 | 43.0 |
hitea-hierarchical-temporal-aware-video | 15.5 | 39.8 | 31.1 |
mplug-2-a-modularized-multi-modal-foundation | 24.1 | 52.0 | 43.8 |
miles-visual-bert-pre-training-with-injected | 11.1 | 30.6 | 24.7 |
internvideo2-scaling-video-foundation-models | 32.0 | 59.4 | 52.4 |
internvideo-general-video-foundation-models | 17.6 | 40.2 | 32.4 |
clover-towards-a-unified-video-language | 14.7 | 38.2 | 29.2 |
seeing-what-you-miss-vision-language-pre | 17.2 | 39.1 | 32.4 |
howtocaption-prompting-llms-to-transform | 27.7 | 54.6 | 46.5 |