Video Retrieval On Ssv2 Template Retrieval
المقاييس
text-to-video R@1
text-to-video R@10
text-to-video R@5
النتائج
نتائج أداء النماذج المختلفة على هذا المعيار القياسي
اسم النموذج | text-to-video R@1 | text-to-video R@10 | text-to-video R@5 | Paper Title | Repository |
---|---|---|---|---|---|
vid-TLDR (UMT-L) | 90.2 | 100.0 | 100.0 | vid-TLDR: Training Free Token merging for Light-weight Video Transformer | |
Singularity-temporal | 77.6 | 98.9 | 96 | Revealing Single Frame Bias for Video-and-Language Learning | |
VindLU | 83.3 | 100 | 100 | VindLU: A Recipe for Effective Video-and-Language Pretraining | |
UMT-L (ViT-L/16) | 90.8 | 100.0 | 100.0 | Unmasked Teacher: Towards Training-Efficient Video Foundation Models | |
HiTeA | 85.6 | 100 | 100 | HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training | - |
0 of 5 row(s) selected.