Toward unsupervised, multi-object discovery in large-scale image collections

This paper addresses the problem of discovering the objects present in acollection of images without any supervision. We build on the optimizationapproach of Vo et al. (CVPR'19) with several key novelties: (1) We propose anovel saliency-based region proposal algorithm that achieves significantlyhigher overlap with ground-truth objects than other competitive methods. Thisprocedure leverages off-the-shelf CNN features trained on classification taskswithout any bounding box information, but is otherwise unsupervised. (2) Weexploit the inherent hierarchical structure of proposals as an effectiveregularizer for the approach to object discovery of Vo et al., boosting itsperformance to significantly improve over the state of the art on severalstandard benchmarks. (3) We adopt a two-stage strategy to select promisingproposals using small random sets of images before using the whole imagecollection to discover the objects it depicts, allowing us to tackle, for thefirst time (to the best of our knowledge), the discovery of multiple objects ineach one of the pictures making up datasets with up to 20,000 images, an overfive-fold increase compared to existing methods, and a first step toward truelarge-scale unsupervised image interpretation.