Video Retrieval On Queryd
المقاييس
text-to-video R@1
text-to-video R@10
text-to-video R@5
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
اسم النموذج | text-to-video R@1 | text-to-video R@10 | text-to-video R@5 | Paper Title | Repository |
---|---|---|---|---|---|
LF-VILA | 69.7 | 90.3 | 85.7 | Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning | |
QB-Norm+TT-CE+ | 15.1 | - | - | Cross Modal Retrieval with Querybank Normalisation | |
VINDLU | 67.8 | 81.8 | 86.3 | VindLU: A Recipe for Effective Video-and-Language Pretraining | |
Frozen | 53.8 | 82.7 | 75.7 | Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval | |
TESTA (ViT-B/16) | 83.4 | 95.3 | 93.8 | TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding |
0 of 5 row(s) selected.