Zero Shot Video Retrieval On Activitynet
Métriques
text-to-video R@1
video-to-text R@1
Résultats
Résultats de performance de divers modèles sur ce benchmark
Tableau comparatif
Nom du modèle | text-to-video R@1 | video-to-text R@1 |
---|---|---|
internvideo-general-video-foundation-models | 30.7 | 31.4 |
video-text-modeling-with-zero-shot-transfer | 34.5 | 33.0 |
vid-tldr-training-free-token-merging-for | 42.8 | 41.2 |
gramian-multimodal-representation-learning | 59.0 | 50.9 |
internvideo2-scaling-video-foundation-models | 60.4 | 54.8 |
languagebind-extending-video-language | 38.4 | 35.7 |
unmasked-teacher-towards-training-efficient | 42.8 | 40.7 |
revealing-single-frame-bias-for-video-and | 30.6 | - |
languagebind-extending-video-language | 41.0 | 39.1 |
internvideo2-scaling-video-foundation-models | 63.2 | 56.5 |
one-for-all-video-conversation-is-feasible | 37.0 | - |
revealing-single-frame-bias-for-video-and | 30.8 | - |