HyperAI
Accueil
Actualités
Articles de recherche récents
Tutoriels
Ensembles de données
Wiki
SOTA
Modèles LLM
Classement GPU
Événements
Recherche
À propos
Français
HyperAI
Toggle sidebar
Rechercher sur le site...
⌘
K
Accueil
SOTA
Zero Shot Video Retrieval
Zero Shot Video Retrieval On Activitynet
Zero Shot Video Retrieval On Activitynet
Métriques
text-to-video R@1
video-to-text R@1
Résultats
Résultats de performance de divers modèles sur ce benchmark
Columns
Nom du modèle
text-to-video R@1
video-to-text R@1
Paper Title
Repository
InternVideo
30.7
31.4
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
VideoCoCa
34.5
33.0
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
-
vid-TLDR (UMT-L)
42.8
41.2
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
GRAM
59.0
50.9
Gramian Multimodal Representation Learning and Alignment
InternVideo2-1B
60.4
54.8
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
LanguageBind(ViT-L/14)
38.4
35.7
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
UMT-L (ViT-L/16)
42.8
40.7
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Singularity-temporal-17M
30.6
-
Revealing Single Frame Bias for Video-and-Language Learning
LanguageBind(ViT-H/14)
41.0
39.1
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
InternVideo2-6B
63.2
56.5
InternVideo2: Scaling Foundation Models for Multimodal Video Understanding
BT-Adapter
37.0
-
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
Singularity-temporal-5M
30.8
-
Revealing Single Frame Bias for Video-and-Language Learning
0 of 12 row(s) selected.
Previous
Next