HyperAI
Accueil
Actualités
Articles de recherche récents
Tutoriels
Ensembles de données
Wiki
SOTA
Modèles LLM
Classement GPU
Événements
Recherche
À propos
Français
HyperAI
Toggle sidebar
Rechercher sur le site...
⌘
K
Accueil
SOTA
Zero Shot Action Recognition
Zero Shot Action Recognition On Kinetics
Zero Shot Action Recognition On Kinetics
Métriques
Top-1 Accuracy
Top-5 Accuracy
Résultats
Résultats de performance de divers modèles sur ce benchmark
Columns
Nom du modèle
Top-1 Accuracy
Top-5 Accuracy
Paper Title
Repository
OST
75.1
94.6
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
DEM
23.6
49.5
Learning a Deep Embedding Model for Zero-Shot Learning
ER-ZSAR (ST+Obj)
42.1
73.1
Elaborative Rehearsal for Zero-shot Action Recognition
IMP-MoE-L
76.8
-
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
-
MAXI
71.6
-
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
BIKE
68.5
91.1
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
ALE
23.4
50.3
Label-Embedding for Image Classification
Text4Vis
68.9
90.3
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
VideoCoCa
70.1
88.9
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
-
X-CLIP
65.2
86.1
Expanding Language-Image Pretrained Models for General Video Recognition
JigsawNet
45.9
78.8
Rethinking Zero-shot Action Recognition: Learning from Latent Atomic Actions
LoCATe-GAT
58.7
-
LoCATe-GAT: Modeling Multi-Scale Local Context and Action Relationships for Zero-Shot Action Recognition
ER-ZSAR (ST)
37.1
69.3
Elaborative Rehearsal for Zero-shot Action Recognition
GCN
22.3
49.7
All About Knowledge Graphs for Actions
-
LanguageBind
64.1
85.7
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
SJE(Word Embedding)
22.3
48.2
Evaluation of Output Embeddings for Fine-Grained Image Classification
DEVISE
23.8
51.0
DeViSE: A Deep Visual-Semantic Embedding Model
-
OTI(ViT-L/14)
70.6
-
Orthogonal Temporal Interpolation for Zero-Shot Video Recognition
TC-CLIP
78.1
95.7
Leveraging Temporal Contextualization for Video Action Recognition
ESZSL
22.9
48.3
An embarrassingly simple approach to zero-shot learning
0 of 20 row(s) selected.
Previous
Next