HyperAI

Action Classification On Charades

Métriques

MAP

Résultats

Résultats de performance de divers modèles sur ce benchmark

Nom du modèle
MAP
Paper TitleRepository
AssembleNet++ 5059.8AssembleNet++: Assembling Modality Representations via Attention Connections
SlowFast (Kinetics-600 pretraining, NL)45.2SlowFast Networks for Video Recognition
TokenLearner66.3TokenLearner: What Can 8 Learned Tokens Do for Images and Videos?
Asyn-TF22.4Asynchronous Temporal Fields for Action Recognition
JMRN (Pose only)16.2Pose And Joint-Aware Action Recognition
PoTion + (GCN + I3D + NL I3D)40.8PoTion: Pose MoTion Representation for Action Recognition-
VicTR (ViT-L/14)57.6VicTR: Video-conditioned Text Representations for Activity Recognition-
STLT + I3D38.5Revisiting spatio-temporal layouts for compositional action recognition
LFB42.5Long-Term Feature Banks for Detailed Video Understanding
AdaFocus (weak supervision, MViT-B-K400-pretrain, 16x4)41.4Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition-
BIKE50.7Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
DEEP-HAL with ODF+SDF (I3D)50.16Self-supervising Action Recognition by Statistical Moment and Subspace Descriptors-
MoViNet-A232.5MoViNets: Mobile Video Networks for Efficient Video Recognition
ActionCLIP (ViT-B/16)44.3ActionCLIP: A New Paradigm for Video Action Recognition
MViT-B, 32x3 (Kinetics-400 pretraining)44.3Multiscale Vision Transformers
JMRN + R101-NL-LFB43.23Pose And Joint-Aware Action Recognition
TubeViT-L66.2Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
MViT-B-24, 32x3 (Kinetics-600 pretraining)47.7Multiscale Vision Transformers
HAF+BoW/FV/OFF halluc. +MSK×8/PN43.1Hallucinating IDT Descriptors and I3D Optical Flow Features for Action Recognition with CNNs-
I3D32.9Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
0 of 49 row(s) selected.