I3D w/ RPN + JFT (Kinetics-400 pretraining( | 22.8 | A Better Baseline for AVA | - |
SlowFast (Kinetics-400 pretraining) | 26.3 | SlowFast Networks for Video Recognition | |
LFB (Kinetics-400 pretraining) | 27.7 | Long-Term Feature Banks for Detailed Video Understanding | |
S3D-G w/ ResNet RPN (Kinetics-400 pretraining( | 22.0 | AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions | |
I3D w/ RPN (Kinetics-400 pretraining( | 21.9 | A Better Baseline for AVA | - |
SlowFast++ (Kinetics-600 pretraining, NL) | 28.3 | SlowFast Networks for Video Recognition | |
SlowFast (Kinetics-600 pretraining, NL) | 27.3 | SlowFast Networks for Video Recognition | |
I3D Tx HighRes | 27.6 | Video Action Transformer Network | - |
SlowFast (Kinetics-600 pretraining) | 26.8 | SlowFast Networks for Video Recognition | |
JMRN + SlowFast-R101-NL | 28.4 | Pose And Joint-Aware Action Recognition | |
D3D (ResNet RPN, Kinetics-400 pretraining) | 23 | D3D: Distilled 3D Networks for Video Action Recognition | |
ACAR-Net, SlowFast R-101 (Kinetics-400 pretraining) | 30.0 | Actor-Context-Actor Relation Network for Spatio-Temporal Action Localization | |