Virtual imaging trials improved the transparency and reliability of AI systems in COVID-19 imaging

The credibility of Artificial Intelligence (AI) models in medical imaging,particularly during the COVID-19 pandemic, has been challenged byreproducibility issues and obscured clinical insights. To address theseconcerns, we propose a Virtual Imaging Trials (VIT) framework, utilizing bothclinical and simulated datasets to evaluate AI systems. This study focuses onusing convolutional neural networks (CNNs) for COVID-19 diagnosis usingcomputed tomography (CT) and chest radiography (CXR). We developed and testedmultiple AI models, 3D ResNet-like and 2D EfficientNetv2 architectures, acrossdiverse datasets. Our evaluation metrics included the area under the curve(AUC). Statistical analyses, such as the DeLong method for AUC confidenceintervals, were employed to assess performance differences. Our findingsdemonstrate that VIT provides a robust platform for objective assessment,revealing significant influences of dataset characteristics, patient factors,and imaging physics on AI efficacy. Notably, models trained on the most diversedatasets showed the highest external testing performance, with AUC valuesranging from 0.73 to 0.76 for CT and 0.70 to 0.73 for CXR. Internal testingyielded higher AUC values (0.77 to 0.85 for CT and 0.77 to 1.0 for CXR),highlighting a substantial drop in performance during external validation,which underscores the importance of diverse and comprehensive training andtesting data. This approach enhances model transparency and reliability,offering nuanced insights into the factors driving AI performance and bridgingthe gap between experimental and clinical settings. The study underscores thepotential of VIT to improve the reproducibility and clinical relevance of AIsystems in medical imaging.