EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015) | 8 | - | 53.7 | - | Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations | |
MMT-Pretrained | - | 13.5 | 40.1 | 29.9 | Multi-modal Transformer for Video Retrieval | |
QB-Norm+CLIP4Clip | - | 22.4 | 49.5 | 40.1 | Cross Modal Retrieval with Querybank Normalisation | |
CenterCLIP (ViT-B/16) | 47.3 | 24.2 | 55.9 | 46.2 | CenterCLIP: Token Clustering for Efficient Text-Video Retrieval | |