AI in Lung Health: Benchmarking Detection and Diagnostic Models Across Multiple CT Scan Datasets

Lung cancer remains the leading cause of cancer-related mortality worldwide,and early detection through low-dose computed tomography (LDCT) has shownsignificant promise in reducing death rates. With the growing integration ofartificial intelligence (AI) into medical imaging, the development andevaluation of robust AI models require access to large, well-annotateddatasets. In this study, we introduce the utility of Duke Lung Cancer Screening(DLCS) Dataset, the largest open-access LDCT dataset with over 2,000 scans and3,000 expert-verified nodules. We benchmark deep learning models for both 3Dnodule detection and lung cancer classification across internal and externaldatasets including LUNA16, LUNA25, and NLST-3D+. For detection, we develop twoMONAI-based RetinaNet models (DLCSDmD and LUNA16-mD), evaluated using theCompetition Performance Metric (CPM). For classification, we compare fivemodels, including state-of-the-art pretrained models (Models Genesis, Med3D), aselfsupervised foundation model (FMCB), a randomly initialized ResNet50, andproposed a novel Strategic Warm-Start++ (SWS++) model. SWS++ uses curatedcandidate patches to pretrain a classification backbone within the samedetection pipeline, enabling task-relevant feature learning. Our modelsdemonstrated strong generalizability, with SWS++ achieving comparable orsuperior performance to existing foundational models across multiple datasets(AUC: 0.71 to 0.90). All code, models, and data are publicly released topromote reproducibility and collaboration. This work establishes a standardizedbenchmarking resource for lung cancer AI research, supporting future efforts inmodel development, validation, and clinical translation.