Forest R-CNN: Large-Vocabulary Long-Tailed Object Detection and Instance Segmentation

Despite the previous success of object analysis, detecting and segmenting alarge number of object categories with a long-tailed data distribution remainsa challenging problem and is less investigated. For a large-vocabularyclassifier, the chance of obtaining noisy logits is much higher, which caneasily lead to a wrong recognition. In this paper, we exploit prior knowledgeof the relations among object categories to cluster fine-grained classes intocoarser parent classes, and construct a classification tree that is responsiblefor parsing an object instance into a fine-grained category via its parentclass. In the classification tree, as the number of parent class nodes aresignificantly less, their logits are less noisy and can be utilized to suppressthe wrong/noisy logits existed in the fine-grained class nodes. As the way toconstruct the parent class is not unique, we further build multiple trees toform a classification forest where each tree contributes its vote to thefine-grained classification. To alleviate the imbalanced learning caused by thelong-tail phenomena, we propose a simple yet effective resampling method, NMSResampling, to re-balance the data distribution. Our method, termed as ForestR-CNN, can serve as a plug-and-play module being applied to most objectrecognition models for recognizing more than 1000 categories. Extensiveexperiments are performed on the large vocabulary dataset LVIS. Compared withthe Mask R-CNN baseline, the Forest R-CNN significantly boosts the performancewith 11.5% and 3.9% AP improvements on the rare categories and overallcategories, respectively. Moreover, we achieve state-of-the-art results on theLVIS dataset. Code is available at https://github.com/JialianW/Forest_RCNN.