PERF-Net (distilled ResNet50-G) | 82.0 | PERF-Net: Pose Empowered RGB-Flow Net | - |
Florence (curated FLD-900M pretrain) | 87.8 | Florence: A New Foundation Model for Computer Vision | |
TokenLearner 16at18 w. Fuser (L/10) | 86.3 | TokenLearner: What Can 8 Learned Tokens Do for Images and Videos? | |
SlowFast 8x8 (ResNet-50) | 79.9 | SlowFast Networks for Video Recognition | |
SlowFast 16x8 (ResNet-101 + NL) | 81.8 | SlowFast Networks for Video Recognition | |