HyperAIHyperAI

Video Retrieval On Lsmdc

Metriken

text-to-video Mean Rank
text-to-video R@1
text-to-video R@10
text-to-video R@5

Ergebnisse

Leistungsergebnisse verschiedener Modelle zu diesem Benchmark

Modellname
text-to-video Mean Rank
text-to-video R@1
text-to-video R@10
text-to-video R@5
Paper TitleRepository
CAMoE54.425.953.746.1Improving Video-Text Retrieval by Multi-Stream Corpus Alignment and Dual Softmax Loss-
HD-VILA-17.444.134.1Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions-
CLIP4Clip58.021.649.841.8CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval-
Collaborative Experts-11.234.826.9Use What You Have: Video Retrieval Using Representations From Collaborative Experts-
MDMMT58.018.847.938.5MDMMT: Multidomain Multimodal Transformer for Video Retrieval-
EMCL-Net-23.950.942.4Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations-
VALOR-34.264.156.0VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset-
X-Pool53.225.253.543.7X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval-
EMCL-Net (Ours)++ LSMDC Rohrbach et al. (2015)8-53.7-Expectation-Maximization Contrastive Learning for Compact Video-and-Language Representations-
HunYuan_tvr (huge)3.940.492.880.1Tencent Text-Video Retrieval: Hierarchical Cross-Modal Interactions with Multi-Level Representations-
InternVideo-34.0--InternVideo: General Video Foundation Models via Generative and Discriminative Learning-
MDMMT-248.026.955.946.7MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization-
CLIP-11.329.222.7A Straightforward Framework For Video Retrieval Using CLIP-
X-CLIP-26.1--X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval-
HiTeA-28.759.050.3HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training-
MoEE-10.134.625.6Learning a Text-Video Embedding from Incomplete and Heterogeneous Data-
MMT-Pretrained-13.540.129.9Multi-modal Transformer for Video Retrieval-
QB-Norm+CLIP4Clip-22.449.540.1Cross Modal Retrieval with Querybank Normalisation-
CenterCLIP (ViT-B/16)47.324.255.946.2CenterCLIP: Token Clustering for Efficient Text-Video Retrieval-
VIOLETv2-2454.143.5An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling-
0 of 38 row(s) selected.
Video Retrieval On Lsmdc | SOTA | HyperAI