Universal Lesion Detection by Learning from Multiple Heterogeneously Labeled Datasets

Lesion detection is an important problem within medical imaging analysis.Most previous work focuses on detecting and segmenting a specialized categoryof lesions (e.g., lung nodules). However, in clinical practice, radiologistsare responsible for finding all possible types of anomalies. The task ofuniversal lesion detection (ULD) was proposed to address this challenge bydetecting a large variety of lesions from the whole body. There are multipleheterogeneously labeled datasets with varying label completeness: DeepLesion,the largest dataset of 32,735 annotated lesions of various types, but with evenmore missing annotation instances; and several fully-labeled single-type lesiondatasets, such as LUNA for lung nodules and LiTS for liver tumors. In thiswork, we propose a novel framework to leverage all these datasets together toimprove the performance of ULD. First, we learn a multi-head multi-task lesiondetector using all datasets and generate lesion proposals on DeepLesion.Second, missing annotations in DeepLesion are retrieved by a new method ofembedding matching that exploits clinical prior knowledge. Last, we discoversuspicious but unannotated lesions using knowledge transfer from single-typelesion detectors. In this way, reliable positive and negative regions areobtained from partially-labeled and unlabeled images, which are effectivelyutilized to train ULD. To assess the clinically realistic protocol of 3Dvolumetric ULD, we fully annotated 1071 CT sub-volumes in DeepLesion. Ourmethod outperforms the current state-of-the-art approach by 29% in the metricof average sensitivity.