HyperAI

Shift Bioscience, a biotechnology company based in Cambridge, England, has released new research that challenges recent criticisms of AI-powered virtual cell models, demonstrating that these models can effectively outperform simple baselines when evaluated using properly calibrated metrics. The study, which focuses on deep learning-based genetic perturbation response models, provides a critical framework for improving the accuracy of performance assessments in computational biology. Genetic perturbation models are a type of AI virtual cell used to predict how cells respond to changes in gene expression—such as gene overexpression or suppression. These models offer a fast, cost-effective alternative to traditional wet lab experiments, enabling researchers to rapidly screen potential therapeutic targets. However, some recent studies have cast doubt on their reliability, reporting that they often fail to outperform basic, uninformative baselines like mean or control predictions, especially in datasets with subtle or weak perturbations. Shift Bioscience’s research identifies the root cause of these misleading results: metric miscalibration. The team found that commonly used evaluation metrics are often ill-suited to distinguish meaningful biological signals from random noise, particularly in low-impact perturbation scenarios. This leads to an underestimation of model performance and fuels skepticism about the utility of virtual cell models. To address this, the researchers developed a new framework for benchmark metric calibration. Using 14 publicly available perturb-seq datasets—each measuring gene expression changes after specific genetic manipulations—the team tested a range of metrics. They identified several rank-based and differentially expressed gene (DEG)-aware metrics that are robust, well-calibrated, and consistent across diverse datasets. When virtual cell models were evaluated using these improved metrics, they consistently outperformed uninformative baselines, including mean, control, and linear models. The results show that the models are capable of capturing biologically relevant patterns in gene expression, provided the right evaluation methods are used. This finding directly contradicts prior claims that AI virtual cells are ineffective, suggesting that the problem lies not with the models themselves, but with the tools used to assess them. Henry Miller, Ph.D., Head of Machine Learning at Shift Bioscience, emphasized the importance of this work: “The reports of poor performance in AI virtual cells are largely due to limitations in the metrics, not the models. When we use well-calibrated metrics, the models perform very well and consistently outperform key baselines. This work paves the way for broader adoption of virtual cells in drug discovery and reinforces our confidence in the models driving our own target identification pipeline for cell rejuvenation.” The company’s research is particularly relevant in the context of aging and regenerative medicine, as Shift Bioscience aims to uncover the biological mechanisms of cellular aging to develop interventions that reduce age-related disease and extend healthy lifespan. By improving the reliability of virtual cell models, the company hopes to accelerate the discovery of new therapeutic targets and reduce the time and cost of early-stage research. The findings have significant implications for the broader field of computational biology. As AI models become increasingly central to drug development, the need for accurate, standardized evaluation methods is critical. Shift Bioscience’s work provides a clear path forward—highlighting that the performance of AI models must be judged not just by the model’s design, but by the quality of the metrics used to measure it. The study, titled Deep Learning-Based Genetic Perturbation Models Do Outperform Uninformative Baselines on Well-Calibrated Metrics, is a key step in validating the use of AI in biological research and may help resolve ongoing debates about the real-world value of virtual cell models.

Related Links

Related Links

Related Links

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.

Command Palette

Shift Bioscience Releases Enhanced AI Framework for Genetic Perturbation Modeling

Related Links

Command Palette

Shift Bioscience Releases Enhanced AI Framework for Genetic Perturbation Modeling

Related Links

Command Palette

Shift Bioscience Releases Enhanced AI Framework for Genetic Perturbation Modeling

Related Links

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.

When Multimodal Computing Begins to Take Off: MiniCPM-o-4.5, With Only 9 Bytes, Covers real-time Image Understanding and Text Generation; vLLM Omni Simultaneously Supports high-throughput Deployment and service-oriented Architecture for Both Text and Multimodal models.