HyperAI초신경

CellFM: a large-scale foundation model pre-trained on transcriptomics of 100 million human cells

Yuansong Zeng, Jiancong Xie, Ningyuan Shangguan, Zhuoyi Wei, Wenbing Li, Yun Su, Shuangyu Yang, Chengyang Zhang, Jinbo Zhang, Nan Fang, Hongyu Zhang, Yutong Lu, Huiying Zhao, Jue Fan, Weijiang Yu, Yuedong Yang
발행일: 6/6/2025
CellFM: a large-scale foundation model pre-trained on transcriptomics of 100 million human cells
초록

Single-cell sequencing provides transcriptomic profiling at single-cell resolution, uncovering cellular heterogeneity with unprecedented precision. Yet, current single cell data analysis suffers from the inherent data noises, batch effects, and sparsity, highlighting the requirement of a unified model to represent cellular states. To circumvent this problem, many recent efforts focus on training single-cell foundation models based on large datasets. However, current human foundation models are still limited by the sizes of training data and model parameters. Here, we have collected a diverse dataset of 100 million human cells, on which we train a single-cell foundation model (CellFM) containing 800 million parameters. To balance efficiency and performance, the model is trained through a modified RetNet framework on the MindSpore. Extensive experiments have shown that CellFM outperforms existing models in cell annotation, perturbation prediction, gene function prediction, and gene-gene relationship capturing.