2 months ago

Open-World Semi-Supervised Learning

Cao, Kaidi ; Brbic, Maria ; Leskovec, Jure

Abstract

A fundamental limitation of applying semi-supervised learning in real-worldsettings is the assumption that unlabeled test data contains only classespreviously encountered in the labeled training data. However, this assumptionrarely holds for data in-the-wild, where instances belonging to novel classesmay appear at testing time. Here, we introduce a novel open-worldsemi-supervised learning setting that formalizes the notion that novel classesmay appear in the unlabeled test data. In this novel setting, the goal is tosolve the class distribution mismatch between labeled and unlabeled data, whereat the test time every input instance either needs to be classified into one ofthe existing classes or a new unseen class needs to be initialized. To tacklethis challenging problem, we propose ORCA, an end-to-end deep learning approachthat introduces uncertainty adaptive margin mechanism to circumvent the biastowards seen classes caused by learning discriminative features for seenclasses faster than for the novel classes. In this way, ORCA reduces the gapbetween intra-class variance of seen with respect to novel classes. Experimentson image classification datasets and a single-cell annotation datasetdemonstrate that ORCA consistently outperforms alternative baselines, achieving25% improvement on seen and 96% improvement on novel classes of the ImageNetdataset.