HyperAI

Video Retrieval On Msr Vtt

المقاييس

text-to-video R@1
text-to-video R@10
text-to-video R@5

النتائج

نتائج أداء النماذج المختلفة على هذا المعيار القياسي

اسم النموذج
text-to-video R@1
text-to-video R@10
text-to-video R@5
Paper TitleRepository
TEFAL5286.176.6Audio-Enhanced Text-to-Video Retrieval using Text-Conditioned Feature Alignment-
VideoCoCa (zero-shot)34.367.057.8VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners-
TACo24.864.052.1TACo: Token-aware Cascade Contrastive Learning for Video-Text Alignment-
VIOLETv237.275.864.8An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
COSA57.9--COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
CoCa (zero-shot)30.061.652.4CoCa: Contrastive Captioners are Image-Text Foundation Models
CLIP21.450.441.1A Straightforward Framework For Video Retrieval Using CLIP
RoME10.741.229.6RoME: Role-aware Mixture-of-Expert Transformer for Text-to-Video Retrieval
InternVideo2-6B62.8--InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
C+LSTM+SA+FC74.219.9-Learning Language-Visual Embedding for Movie Understanding with Natural-Language-
VALOR59.989.683.5VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
GRAM6489.3-Gramian Multimodal Representation Learning and Alignment
Aurora (ours, r=64)52.48273.9--
Kaufman4.724.1-Temporal Tessellation: A Unified Approach for Video Analysis
Ours26-56.7Video and Text Matching with Conditioned Embeddings
Text-Video Embedding14.952.8-HowTo100M: Learning a Text-Video Embedding by Watching Hundred Million Narrated Video Clips
FROZEN32.571.261.5Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
LAFF29.165.854.9Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval
All-in-one + MELTR38.684.774.4MELTR: Meta Loss Transformer for Learning to Fine-tune Video Foundation Models
JSFusion10.243.2-A Joint Sequence Fusion Model for Video Question Answering and Retrieval
0 of 40 row(s) selected.