AI Model Decodes Long-Range DNA Signals Controlling RNA Splicing
Researchers at the University of Tokyo have developed SpliceSelectNet, a hierarchical Transformer framework that accurately models long-range genomic interactions essential for RNA splicing. Published recently in Nucleic Acids Research, the architecture overcomes longstanding computational barriers by analyzing DNA sequences up to 100,000 base pairs while maintaining single-nucleotide resolution. Traditional AI models for genomic analysis often rely on architectures adapted from natural language processing, limiting their ability to capture regulatory signals thousands of base pairs away from target splice sites. SpliceSelectNet addresses this by dividing extensive DNA sequences into manageable blocks, applying local attention mechanisms to identify patterns within each segment, and subsequently integrating these findings through a hierarchical global attention process. This design preserves computational efficiency while enabling dense, sequence-wide analysis. The framework also generates attention maps that highlight biologically significant regulatory regions, bridging predictive accuracy with biological interpretability. In comprehensive benchmarking against leading splice-prediction systems, the model achieved state-of-the-art performance across multiple validation datasets. It successfully detected aberrant splicing patterns and maintained sensitivity to distant regulatory variants, including simulations involving the DMD gene and clinical pathogenic variants sourced from ClinVar. Researchers demonstrated that attention scores consistently align with known functional genomic elements, validating the system capacity to decode complex splicing mechanisms beyond the reach of conventional convolutional neural networks. Beyond immediate splice-site prediction, the hierarchical Transformer architecture presents broader applications in computational genomics. The framework is positioned to support future investigations into promoter-enhancer interactions, three-dimensional genome organization, and foundational DNA language models. In clinical and pharmaceutical contexts, the system offers new capabilities for interpreting variants of uncertain significance in noncoding regions and accelerating the design of oligonucleotide therapeutics targeting abnormal splicing. According to lead researchers Professor Kenta Nakai and Ph.D. candidate Yuna Miyachi, the system represents a paradigm shift toward biologically aligned AI architectures. By explicitly accounting for long-range genomic dependencies and strict sequence resolution, the framework provides a robust tool for precision genomic medicine, variant screening, and the decoding of complex regulatory networks governing gene expression.
