HyperAIHyperAI

Command Palette

Search for a command to run...

CellForge: AI Agents Automate Virtual Cell Model Design

A groundbreaking multi-agent system called CellForge has been introduced by a collaborative team from Yale University, the University of Pennsylvania, Stanford University, Harvard University, the German Helmholtz Munich Center, and other institutions. The system, detailed in a preprint paper titled CellForge: Agentic Design of Virtual Cell Models on arXiv, represents the first fully autonomous framework capable of designing and generating virtual cell models from scratch, marking a transformative step in AI for Science. Led by Mark Gerstein and Smita Krishnaswamy from Yale, along with researchers including Xiangru Tang, Zhi Huang from Penn, Yan Cui, Fang Wu from Stanford, Xihong Lin from Harvard, Fabian Theis and Weixu Wang from Helmholtz Munich, the team has developed a system that takes raw single-cell multi-omics data and a natural language task description—such as modeling the effect of a drug or gene knockout—and automatically generates optimized, executable models. These models can simulate cellular responses to various perturbations, enabling predictive insights into biological processes like disease progression, immune response, and cancer development. What sets CellForge apart is its multi-agent architecture. Instead of relying on a single AI model, the system employs specialized agents—each acting as a domain expert such as a data analyst, model designer, biologist, or training specialist. These agents collaboratively engage in iterative debates and critical discussions, refining their proposals over multiple rounds until reaching a consensus on the best model design. This process mirrors real-world interdisciplinary scientific collaboration, where researchers review literature, challenge assumptions, and iteratively improve experimental plans. The system’s workflow consists of three key stages: task analysis, method design, and experimental execution. In the first stage, CellForge analyzes input data and retrieves relevant scientific literature to contextualize the task. During method design, agents propose diverse model architectures and training strategies, then critique each other’s ideas through up to 10 rounds of discussion. Finally, the system translates the agreed-upon plan into fully functional Python code that handles data preprocessing, model training, hyperparameter tuning, validation, and result visualization—enabling end-to-end automation. In benchmark tests across six diverse datasets involving gene knockouts, drug treatments, and cytokine stimulation, CellForge consistently outperformed state-of-the-art models such as scGPT, Geneformer, and ChemCPA. It also demonstrated strong cross-modal capabilities, achieving significant improvements on scATAC-seq and CITE-seq data—two modalities often handled separately in prior work. This flexibility allows CellForge to adapt rapidly to new data types and research questions without requiring re-engineering. A key innovation is the system’s ability to autonomously generate high-quality, executable code—a rare feature in current AI for Science platforms. Most AI tools stop at suggesting ideas or providing partial analysis. CellForge, however, delivers a complete, ready-to-run research pipeline, drastically lowering the technical barrier for biologists without strong computational backgrounds. The team also implemented a hybrid search strategy inspired by depth-first and breadth-first exploration, enabling agents to conduct thorough literature and data analysis before proposing solutions. This ensures that model designs are not only effective but also scientifically grounded and interpretable, reducing the risk of hallucination or bias. When compared to general-purpose research automation frameworks like Biomni and DeepResearch, CellForge showed superior performance in task specificity, model quality, and expert evaluation—particularly in cross-modal prediction and human-assessed model relevance. Researchers who have tested CellForge report that it can generate models superior to those designed by human experts, accelerating scientific discovery from years to days or even hours. The system’s potential extends beyond academic research. In drug development, for example, CellForge could simulate how cells respond to candidate compounds in silico, helping prioritize the most promising leads before costly lab or clinical trials. Looking ahead, the team aims to integrate CellForge with automated laboratory systems, creating a closed-loop platform where AI designs experiments, runs them in robotic labs, and iteratively refines models based on real-world results. This vision could usher in a new era of scalable, industrialized science. The project builds on prior work by Gerstein and Tang, including Medagents for medical diagnosis and ChemAgent for quantum chemistry reasoning, demonstrating a consistent effort to apply evolving multi-agent systems to complex scientific problems. Now publicly available on GitHub, CellForge’s code and preprint are open for global use and improvement. The researchers emphasize that the goal is not to replace scientists, but to empower them with a powerful AI collaborator—enabling a new era of human-AI co-discovery in biology and medicine. As the paper concludes, CellForge represents more than a tool—it is a step toward true AI scientists capable of autonomous hypothesis generation, model design, and experimental planning. With its ability to rapidly explore vast scientific landscapes, it may accelerate the pace of discovery in fields ranging from cancer research to synthetic biology, fundamentally transforming how science is done.

Related Links