Chinese Academy of Sciences releases RNA splicing prediction model
Researchers at the Beijing Institute of Genomics, Chinese Academy of Sciences, have developed a novel computational model named HELIX to predict RNA splicing regulation at the transcript level. RNA splicing is a critical biological process that generates diverse transcripts from a single gene, directly influencing their sequence, structure, and function. While long-read sequencing technologies offer promising tools for identifying full-length transcripts across different tissues, the scarcity of such data due to high costs and sampling difficulties has hindered the analysis of splicing in complex physiological states. Existing algorithms primarily focus on predicting splice site strength in individual samples but struggle with transcript-level predictions and generalizing to unknown tissues. HELIX addresses these limitations by integrating genomic sequences with tissue-specific RNA-binding protein expression matrices. The model utilizes a context-dependent splicing regulation mechanism based on a hierarchical deep learning architecture. This strategy employs nested sub-models to first predict splice sites and their baseline intensities based on DNA sequences, then incorporate expression features from 1,499 RNA-binding proteins to accurately forecast splicing regulation levels for specific samples. Finally, it uses an embedding-inherited long short-term memory network to analyze dependencies between multiple splice sites. Evaluation results indicate that HELIX outperforms current mainstream methods in predicting highly regulated sites and relative transcript abundance. In practical applications, the team used HELIX to analyze a large cohort of colorectal cancer patients. The model successfully identified aberrant splicing and transcript usage patterns in tumor cells, revealing strong correlations with genomic mutations, abnormal RNA-binding protein expression, and clinical characteristics. These findings provide molecular insights into tumor mechanisms and patient stratification. To further address intratumoral heterogeneity, the researchers extended HELIX into a single-cell version called scHELIX. This tool predicts differences in transcript usage across various cell types and tumor subpopulations. Analysis revealed distinct splicing and expression signatures between different tumor subclones, offering new perspectives for understanding tumor evolution and identifying therapeutic targets. The study enhances the understanding of tissue-specific and disease-associated splicing mechanisms. It establishes a robust methodological foundation for cancer subtyping, explaining pathogenic variants, and advancing precision medicine. The research, supported by the National Natural Science Foundation of China, was published in Nature Computational Science. This development represents a significant step forward in utilizing computational biology to decode complex genetic regulatory networks in health and disease.
