HyperAI
HyperAI
Home
Console
Docs
News
Papers
Tutorials
Datasets
Wiki
SOTA
LLM Models
GPU Leaderboard
Events
Search
About
Terms of Service
Privacy Policy
English
HyperAI
HyperAI
Toggle Sidebar
Search the site…
⌘
K
Command Palette
Search for a command to run...
Console
Home
SOTA
Video Retrieval
Video Retrieval On Vatex
Video Retrieval On Vatex
Metrics
text-to-video R@1
text-to-video R@10
text-to-video R@5
Results
Performance results of various models on this benchmark
Columns
Model Name
text-to-video R@1
text-to-video R@10
text-to-video R@5
Paper Title
GRAM
87.7
100
-
Gramian Multimodal Representation Learning and Alignment
VAST
83.0
99.2
98.2
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
VALOR
78.5
98.7
97.1
VALOR: Vision-Audio-Language Omni-Perception Pretraining Model and Dataset
InternVideo2-6B
75.5
-
-
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
Unmasked Teacher
72
97.8
95.1
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
InternVideo
71.1
-
-
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Side4Video
68.8
97.0
93.5
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
Cap4Video
66.6
97.0
93.1
Cap4Video: What Can Auxiliary Captions Do for Text-Video Retrieval?
TeachCLIP
63.6
96.1
91.9
Holistic Features are almost Sufficient for Text-to-Video Retrieval
TS2-Net
59.1
95.2
-
TS2-Net: Token Shift and Selection Transformer for Text-Video Retrieval
LAFF
59.1
91.7
-
Lightweight Attentional Feature Fusion: A New Baseline for Text-to-Video Retrieval
QB-Norm+CLIP2Video
58.8
93.8
-
Cross Modal Retrieval with Querybank Normalisation
CLIP2Video
57.3
90
-
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP
0 of 13 row(s) selected.
Previous
Next