SlowFast (Kinetics-600 pretraining, NL) | 45.2 | SlowFast Networks for Video Recognition | |
PoTion + (GCN + I3D + NL I3D) | 40.8 | PoTion: Pose MoTion Representation for Action Recognition | - |
AdaFocus (weak supervision, MViT-B-K400-pretrain, 16x4) | 41.4 | Towards Weakly Supervised End-to-end Learning for Long-video Action Recognition | - |
MViT-B, 32x3 (Kinetics-400 pretraining) | 44.3 | Multiscale Vision Transformers | |
JMRN + R101-NL-LFB | 43.23 | Pose And Joint-Aware Action Recognition | |
MViT-B-24, 32x3 (Kinetics-600 pretraining) | 47.7 | Multiscale Vision Transformers | |