Learning to Discover and Detect Objects

We tackle the problem of novel class discovery and localization (NCDL). Inthis setting, we assume a source dataset with supervision for only some objectclasses. Instances of other classes need to be discovered, classified, andlocalized automatically based on visual similarity without any humansupervision. To tackle NCDL, we propose a two-stage object detection networkRegion-based NCDL (RNCDL) that uses a region proposal network to localizeregions of interest (RoIs). We then train our network to learn to classify eachRoI, either as one of the known classes, seen in the source dataset, or one ofthe novel classes, with a long-tail distribution constraint on the classassignments, reflecting the natural frequency of classes in the real world. Bytraining our detection network with this objective in an end-to-end manner, itlearns to classify all region proposals for a large variety of classes,including those not part of the labeled object class vocabulary. Ourexperiments conducted using COCO and LVIS datasets reveal that our method issignificantly more effective than multi-stage pipelines that rely ontraditional clustering algorithms. Furthermore, we demonstrate the generalityof our approach by applying our method to a large-scale Visual Genome dataset,where our network successfully learns to detect various semantic classeswithout direct supervision.