Slow Fusion + Finetune top 3 layers | 65.4 | Large-Scale Video Classification with Convolutional Neural Networks | |
Two-Stream I3D (Kinetics pre-training) | 97.8 | Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset | |
R[2+1]D-Flow (Sports-1M pretrained) | 93.3 | A Closer Look at Spatiotemporal Convolutions for Action Recognition | |
R[2+1]D-RGB (Sports-1M pretrained) | 93.6 | A Closer Look at Spatiotemporal Convolutions for Action Recognition | |
Flow-I3D (Imagenet+Kinetics pre-training) | 96.7 | Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset | |
MF-Net, RGB only (ImageNet+Kinetics pretrained) | 96.0 | Multi-Fiber Networks for Video Recognition | - |