HyperAI

Zero Shot Video Retrieval On Didemo

المقاييس

text-to-video R@1
text-to-video R@10
text-to-video R@5

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

جدول المقارنة
اسم النموذجtext-to-video R@1text-to-video R@10text-to-video R@5
revealing-single-frame-bias-for-video-and36.969.361.1
internvideo2-scaling-video-foundation-models57.984.680.0
one-for-all-video-conversation-is-feasible35.672.661.9
languagebind-extending-video-language39.974.666.1
hitea-hierarchical-temporal-aware-video43.279.069.3
clover-towards-a-unified-video-language29.566.355.2
languagebind-extending-video-language39.773.865.5
mplug-2-a-modularized-multi-modal-foundation45.779.271.1
vast-a-vision-audio-subtitle-text-omni-155.579.674.3
revealing-single-frame-bias-for-video-and37.169.961.7
violet-end-to-end-video-language-transformers23.559.849.8
miles-visual-bert-pre-training-with-injected27.263.650.3
gramian-multimodal-representation-learning54.280.7-
align-and-prompt-video-and-language-pre23.857.947.3
internvideo-general-video-foundation-models31.568.257.6
videoclip-contrastive-pre-training-for-zero16.6-46.9
frozen-in-time-a-joint-video-and-image21.156.246.0
bridgeformer-bridging-video-text-retrieval25.661.150.6
hitea-hierarchical-temporal-aware-video36.170.360.1
object-aware-video-language-pre-training-for23.559.850.4
vid-tldr-training-free-token-merging-for52.081.074.0
unmasked-teacher-towards-training-efficient48.679.072.9
frozen-in-time-a-joint-video-and-image20.258.546.4
omnivl-one-foundation-model-for-image33.368.558.7
lat-latent-translation-with-cycle-consistency22.658.945.9
internvideo2-scaling-video-foundation-models57.085.180.0