HyperAI

Zero Shot Action Recognition On Kinetics

Métriques

Top-1 Accuracy
Top-5 Accuracy

Résultats

Résultats de performance de divers modèles sur ce benchmark

Nom du modèle
Top-1 Accuracy
Top-5 Accuracy
Paper TitleRepository
OST75.194.6OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
DEM23.649.5Learning a Deep Embedding Model for Zero-Shot Learning
ER-ZSAR (ST+Obj)42.173.1Elaborative Rehearsal for Zero-shot Action Recognition
IMP-MoE-L76.8-Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception-
MAXI71.6-MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
BIKE68.591.1Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
ALE23.450.3Label-Embedding for Image Classification
Text4Vis68.990.3Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
VideoCoCa70.188.9VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners-
X-CLIP65.286.1Expanding Language-Image Pretrained Models for General Video Recognition
JigsawNet45.978.8Rethinking Zero-shot Action Recognition: Learning from Latent Atomic Actions
LoCATe-GAT58.7-LoCATe-GAT: Modeling Multi-Scale Local Context and Action Relationships for Zero-Shot Action Recognition
ER-ZSAR (ST)37.169.3Elaborative Rehearsal for Zero-shot Action Recognition
GCN22.349.7All About Knowledge Graphs for Actions-
LanguageBind64.185.7LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
SJE(Word Embedding)22.348.2Evaluation of Output Embeddings for Fine-Grained Image Classification
DEVISE23.851.0DeViSE: A Deep Visual-Semantic Embedding Model-
OTI(ViT-L/14)70.6-Orthogonal Temporal Interpolation for Zero-Shot Video Recognition
TC-CLIP78.195.7Leveraging Temporal Contextualization for Video Action Recognition
ESZSL22.948.3An embarrassingly simple approach to zero-shot learning
0 of 20 row(s) selected.