Faster R-CNN (ImageNet+300M) | 58 | 40.1 | Revisiting Unreasonable Effectiveness of Data in Deep Learning Era | |
GLIP (Swin-L, multi-scale) | 79.5 | 67.7 | Grounded Language-Image Pre-training | |
Mask R-CNN (ResNeXt-101-FPN) | 62.3 | 43.4 | Mask R-CNN | |
ISTR (ResNet50-FPN-3x, single-scale) | - | - | ISTR: End-to-End Instance Segmentation with Transformers | |
CPNDet (Hourglass-104, multi-scale) | 67.3 | 53.7 | Corner Proposal Network for Anchor-free, Two-stage Object Detection | |
D-RFCN + SNIP (ResNet-101, multi-scale) | 65.5 | 48.4 | An Analysis of Scale Invariance in Object Detection - SNIP | - |
AC-FPN Cascade R-CNN (X-152-32x8d-FPN-IN5k, multi scale, only CEM) | 70.4 | 57 | Attention-guided Context Feature Pyramid Network for Object Detection | |