Flow-based (ResNet-152) | 73.7 | 91.9 | 81.1 | 80 | 70.3 | 79 | Simple Baselines for Human Pose Estimation and Tracking | |
PPE (ResNeXt-101) | 75.7 | 90.3 | 76.3 | 79.5 | 80.7 | - | Deep Multi-Task Networks For Occluded Pedestrian Pose Estimation | - |
OmniPose (WASPv2) | 76.4 | 92.6 | 83.7 | 82.6 | 72.6 | 81.2 | OmniPose: A Multi-Scale Framework for Multi-Person Pose Estimation | |
Faster R-CNN (ImageNet+300M) | 64.4 | 85.7 | 70.7 | 69.8 | 61.8 | - | Revisiting Unreasonable Effectiveness of Data in Deep Learning Era | |
Lite-HRNet-30 | 69.7 | 90.7 | 77.5 | 75.0 | 66.9 | 75.4 | Lite-HRNet: A Lightweight High-Resolution Network | |
TFPose (ND=6 ResNet-50) | 72.2 | 90.9 | 80.1 | 78.8 | 69.1 | - | TFPose: Direct Human Pose Estimation with Transformers | - |
TransPose-H-A6 | 75 | 92.2 | 82.3 | 81.1 | 71.3 | - | TransPose: Keypoint Localization via Transformer | |
ViTPose (ViTAE-G, ensemble) | 81.1 | 95.0 | 88.2 | 86.0 | 77.8 | 85.6 | ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation | |
ViTPose (ViTAE-G) | 80.9 | 94.8 | 88.1 | 85.9 | 77.5 | 85.4 | ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation | |
RMPE++ | 72.3 | 89.2 | 79.1 | 78.6 | 68.0 | - | RMPE: Regional Multi-person Pose Estimation | |