Video Retrieval On Condensed Movies

text-to-video R@1

text-to-video R@10

text-to-video R@5

평가 결과

이 벤치마크에서 각 모델의 성능 결과

				Paper Title
TESTA (ViT-B/16)	24.9	55.1	46.5	TESTA: Temporal-Spatial Token Aggregation for Long-form Video-Language Understanding
VINDLU	18.4	44.3	36.4	VindLU: A Recipe for Effective Video-and-Language Pretraining
LF-VILA	13.6	41.8	32.5	Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning

0 of 3 row(s) selected.