CellForge: Agentic Design of Virtual Cell Models

Virtual cell modeling represents an emerging frontier at the intersection ofartificial intelligence and biology, aiming to predict quantities such asresponses to diverse perturbations quantitatively. However, autonomouslybuilding computational models for virtual cells is challenging due to thecomplexity of biological systems, the heterogeneity of data modalities, and theneed for domain-specific expertise across multiple disciplines. Here, weintroduce CellForge, an agentic system that leverages a multi-agent frameworkthat transforms presented biological datasets and research objectives directlyinto optimized computational models for virtual cells. More specifically, givenonly raw single-cell multi-omics data and task descriptions as input, CellForgeoutputs both an optimized model architecture and executable code for trainingvirtual cell models and inference. The framework integrates three core modules:Task Analysis for presented dataset characterization and relevant literatureretrieval, Method Design, where specialized agents collaboratively developoptimized modeling strategies, and Experiment Execution for automatedgeneration of code. The agents in the Design module are separated into expertswith differing perspectives and a central moderator, and have tocollaboratively exchange solutions until they achieve a reasonable consensus.We demonstrate CellForge's capabilities in single-cell perturbation prediction,using six diverse datasets that encompass gene knockouts, drug treatments, andcytokine stimulations across multiple modalities. CellForge consistentlyoutperforms task-specific state-of-the-art methods. Overall, CellForgedemonstrates how iterative interaction between LLM agents with differingperspectives provides better solutions than directly addressing a modelingchallenge. Our code is publicly available athttps://github.com/gersteinlab/CellForge.