Video Retrieval On Condensed Movies
Métriques
text-to-video R@1
text-to-video R@10
text-to-video R@5
Résultats
Résultats de performance de divers modèles sur ce benchmark
Nom du modèle | text-to-video R@1 | text-to-video R@10 | text-to-video R@5 | Paper Title | Repository |
---|---|---|---|---|---|
VINDLU | 18.4 | 44.3 | 36.4 | VindLU: A Recipe for Effective Video-and-Language Pretraining | |
LF-VILA | 13.6 | 41.8 | 32.5 | Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning | |
TESTA (ViT-B/16) | 24.9 | 55.1 | 46.5 | TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding |
0 of 3 row(s) selected.