Advancing 3D Medical Image Analysis with Variable Dimension Transform based Supervised 3D Pre-training

The difficulties in both data acquisition and annotation substantiallyrestrict the sample sizes of training datasets for 3D medical imagingapplications. As a result, constructing high-performance 3D convolutionalneural networks from scratch remains a difficult task in the absence of asufficient pre-training parameter. Previous efforts on 3D pre-training havefrequently relied on self-supervised approaches, which use either predictive orcontrastive learning on unlabeled data to build invariant 3D representations.However, because of the unavailability of large-scale supervision information,obtaining semantically invariant and discriminative representations from theselearning frameworks remains problematic. In this paper, we revisit aninnovative yet simple fully-supervised 3D network pre-training framework totake advantage of semantic supervisions from large-scale 2D natural imagedatasets. With a redesigned 3D network architecture, reformulated naturalimages are used to address the problem of data scarcity and develop powerful 3Drepresentations. Comprehensive experiments on four benchmark datasetsdemonstrate that the proposed pre-trained models can effectively accelerateconvergence while also improving accuracy for a variety of 3D medical imagingtasks such as classification, segmentation and detection. In addition, ascompared to training from scratch, it can save up to 60% of annotation efforts.On the NIH DeepLesion dataset, it likewise achieves state-of-the-art detectionperformance, outperforming earlier self-supervised and fully-supervisedpre-training approaches, as well as methods that do training from scratch. Tofacilitate further development of 3D medical models, our code and pre-trainedmodel weights are publicly available at https://github.com/urmagicsmine/CSPR.