Miraomics, Pythia Biosciences, and LatchBio Unveil 30 Million Cell Atlas and AI-Driven Data Curation Tools to Accelerate Biomedical Research
SAN FRANCISCO -- (BUSINESS WIRE) -- Miraomics, Pythia Biosciences, and LatchBio have jointly released a comprehensive 30 million cell atlas, covering over 150 indications, 200 tissue types, and 27 measurement technologies, all curated from public sources. This extensive data set aims to bridge the gap between vast amounts of available biological data and the structured, annotated information needed for advanced bioinformatics and machine learning applications. Engineering biology is becoming increasingly reliant on sophisticated statistical models to understand the complex behaviors of living systems that surpass human cognitive limits. Despite the availability of millions of single-cell transcriptomes scattered across the internet, these resources often go unused due to the significant human effort required to organize and annotate them for practical applications. Currently, public datasets represent the largest repository of single-cell RNA sequencing (scRNA-seq) data, offering unparalleled diversity in terms of diseases, tissues, and patient populations. While purpose-built industrial data generation projects, such as perturbation atlases, provide valuable insights, they lack the broad observational data necessary to address a wide range of translational contexts, particularly those involving rare diseases with limited patient populations. Companies like Pythia Biosciences and Miraomics specialize in curating these molecular data sets, making them accessible and usable for large-scale bioinformatics and machine learning initiatives. LatchBio’s white-labeled data infrastructure and portal facilitate the distribution of this curated data, ensuring that biopharma and biotech companies can leverage it effectively. Tristan Gill, Co-Founder and CEO of Pythia Biosciences, highlighted the significance of this collaboration: "By working with innovative partners like LatchBio and Miraomics, we can broaden the reach of our high-quality, expertly curated scientific content. This marks the first in a series of releases where portions of the Pythiomics multi-omics database, renowned for its depth, precision, and scientific rigor, will be easily accessible via the Latch platform." Eugene Bolotin, Co-Founder and CEO of Miraomics, added: "We are thrilled to announce the release of this major, high-quality curated data set, which represents thousands of hours of curation effort. It opens up new possibilities for the development of advanced AI tools and provides novel insights into basic science, disease progression, and drug discovery." In addition to the cell atlas, LatchBio has introduced a suite of agentic molecular curation tools designed to streamline the data curation process. These tools reduce the time required for per-dataset curation by about 40 times and enhance the quality and consistency of annotations by integrating information from entire research papers and unstructured supplements. The curation framework can even fully automate the process in certain scenarios. A detailed whitepaper explaining the design and functionality of these tools is available on the LatchBio website. Kenny Workman, Co-Founder and CTO of LatchBio, stated: "Our goal is to organize the world's public molecular data for immediate access, benefiting small biotechs, large pharma companies, and cutting-edge AI laboratories. Partnering with leading solution providers is essential to achieving this vision." This collaborative effort underscores the growing importance of structured, high-quality biological data in advancing AI and biotechnology, ultimately aiming to accelerate breakthroughs in medical research and drug development.