4 months ago

Abstract

Multi-template polymerase chain reaction (PCR) is a critical technique enabling the parallel amplification of diverse DNA molecules, thereby facilitating applications in fields from quantitative molecular biology to DNA data storage. However, non-homogeneous amplification due to sequence-specific amplification efficiencies often results in skewed abundance data, compromising accuracy and sensitivity. In this study, we address amplification efficiency in complex amplicon libraries by employing one-dimensional convolutional neural networks (1D-CNNs) to predict sequence-specific amplification efficiencies, based on sequence information alone. Trained on reliably annotated datasets derived from synthetic DNA pools, these models achieve a high predictive performance (AUROC: 0.88, AUPRC: 0.44), thereby enabling the design of inherently homogeneous amplicon libraries. We further introduce CluMo, a deep learning interpretation framework that identifies specific motifs adjacent to adapter priming sites as closely associated with poor amplification. This insight leads to the elucidation of adapter-mediated self-priming as the major mechanism causing low amplification efficiency, challenging long-standing PCR design assumptions. By addressing the basis for non-homogeneous amplification in multi-template PCR, our deep-learning approach reduces the required sequencing depth to recover 99% of amplicon sequences fourfold, and opens new avenues to improve the efficiency of DNA amplification in fields such as genomics, diagnostics, and synthetic biology.

Source PDF