HyperAI
Accueil
Actualités
Articles de recherche récents
Tutoriels
Ensembles de données
Wiki
SOTA
Modèles LLM
Classement GPU
Événements
Recherche
À propos
Français
HyperAI
Toggle sidebar
Rechercher sur le site...
⌘
K
Accueil
SOTA
Action Classification
Action Classification On Kinetics 600
Action Classification On Kinetics 600
Métriques
Top-1 Accuracy
Résultats
Résultats de performance de divers modèles sur ce benchmark
Columns
Nom du modèle
Top-1 Accuracy
Paper Title
Repository
D3D+S3D-G
79.1
D3D: Distilled 3D Networks for Video Action Recognition
XViT (x16)
84.5
Space-time Mixing Attention for Video Transformer
MoViNet-A5
82.7
MoViNets: Mobile Video Networks for Efficient Video Recognition
VideoMAE V2-g
88.8
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
MoViNet-A2
77.5
MoViNets: Mobile Video Networks for Efficient Video Recognition
PERF-Net (distilled ResNet50-G)
82.0
PERF-Net: Pose Empowered RGB-Flow Net
-
mPLUG-2
89.8
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Florence (curated FLD-900M pretrain)
87.8
Florence: A New Foundation Model for Computer Vision
MoViNet-A6
83.5
MoViNets: Mobile Video Networks for Efficient Video Recognition
S3D-G (RGB)
76.6
Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification
Modèle 11
89.7
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
-
UniFormer-B (ImageNet-1K)
84.8
UniFormer: Unified Transformer for Efficient Spatial-Temporal Representation Learning
LGD-3D Flow
75
Learning Spatio-Temporal Representation with Local and Global Diffusion
-
TokenLearner 16at18 w. Fuser (L/10)
86.3
TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
SlowFast 8x8 (ResNet-50)
79.9
SlowFast Networks for Video Recognition
UMT-L (ViT-L/16)
90.5
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
SlowFast 16x8 (ResNet-101 + NL)
81.8
SlowFast Networks for Video Recognition
EVA
89.8%
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
I3D (RGB)
73.6
A Short Note about Kinetics-600
TubeVit-L
91.5
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
0 of 65 row(s) selected.
Previous
Next