Online Tutorial | 150 Professional Tools/59 Databases/105 Packages, Biomni Surpasses Expert-level Efficiency in 8 Real Research Tasks

Modern biomedical research is caught in the contradiction of "data explosion and efficiency bottleneck". On the one hand, the development of technologies such as gene sequencing and single-cell analysis has spawned massive multimodal data - from genomic base sequences to clinical imaging data, from species abundance of microbiomes to small molecule maps of metabolomes, the data scale has reached PB level; on the other hand, the fragmentation of research processes has seriously restricted the speed of discovery: a typical multi-omics analysis may require calling more than ten tools, querying dozens of databases, and referring to hundreds of papers, and these tasks often rely on researchers to complete manually, which is not only time-consuming but also error-prone.
However, most existing AI tools are “specialized”, such as models that focus on CRISPR experiment design or single-cell annotation.They can only handle a single task and have difficulty collaborating across domains.When the research involves the intersection of genetics and pharmacology, or needs to integrate clinical data with basic research results, these tools are not up to the task. Therefore, building a general biomedical intelligent agent that can think across boundaries and make autonomous decisions like human scientists has become the key to breaking through the current research dilemma.
In this regard,Stanford University, in collaboration with Genentech, Arc Institute, UCSF and other institutions, has developed the first universal biomedical AI agent, Biomni.It can autonomously perform a wide range of research tasks across different biomedical subfields and create the first unified environmental agent - mining the necessary tools, databases, and solutions from tens of thousands of publications in 25 biomedical fields. On this basis, Biomni has a general agent architecture that combines large language model (LLM) reasoning with retrieval-enhanced planning and code-based execution, enabling it to dynamically build and execute complex biomedical workflows without relying on predefined templates or strict task processes. System benchmarks show that Biomni achieves strong generalization in heterogeneous biomedical tasks without any task-specific prompt tuning.
Paper address:
One-click deployment tutorial link:
The core goal of Biomni is to develop a general biomedical AI agent that does not require a predefined template, so that it can autonomously complete cross-domain research tasks.Specifically, it includes three aspects of capabilities:
* Breaking through task-specific limitations:Biomni hopes to be able to handle a variety of tasks from "rare disease diagnosis" to "microbiome difference analysis" using only natural language instructions.
* Integrating multimodal capabilities:Open up the entire process from data to experiments. The closed loop of biomedical research is "data input → analysis and reasoning → hypothesis generation → experimental design → result verification". Biomni needs to cover every link - it can process wearable device data in Excel format and single cell data in h5ad format, generate Python code for analysis, and design PCR experiments to verify hypotheses.
* Establishing a new paradigm of human-machine collaboration:Biomni's positioning is not to replace scientists, but to become a "super assistant" - automatically completing repetitive tasks such as data cleaning and literature retrieval, allowing researchers to focus on hypothesis construction and innovative design.
Dataset: Three-layer dataset to build a biomedical knowledge base
Biomni's powerful capabilities come from its systematic integration of biomedical research resources. By building a three-layer data set, the team has created a "digital laboratory" for AI that covers tools, data, and tasks.
To define the basic operational units of biomedical research,The team selected 100 latest papers published in 2024 from each of the 25 subject categories of bioRxiv (such as genomics, microbiology, and pharmacology).Through the "Action Discovery Agent", the research process is analyzed one by one, and four core elements are extracted: tasks, tools, software packages, and databases. The database contains 59 core resources, which are divided into two categories: large databases accessible by API (such as PDB for storing protein structures and ClinVar, a clinical variation database), and locally deployed structured data sets (such as GWAS summary statistics and microbiome reference genomes).
Secondly, to verify the generalization ability, the team constructed a multi-level evaluation dataset: a general knowledge benchmark and a real-world task set.General knowledge benchmarks include LAB-Bench (including DbQA database question answering and SeqQA sequence reasoning) and Humanity's Last Exam (covering 14 biomedical subfields).These datasets do not rely on specific tools and focus on examining the basic reasoning capabilities of AI. The real-world task set contains 8 cross-domain tasks, each corresponding to an actual research scenario.
To demonstrate the practical application value,The team selected three types of typical data as case study data:
* Wearable device data: 458 Excel files from 30 participants, including continuous glucose monitoring (CGM) and temperature data (covering 2 hours before meals to 4 hours after meals), and 227 nights of sleep records (including sleep duration, efficiency, stage, etc.);
* Multi-omics data: Single-cell dataset of human embryonic skeletal development (snRNA-seq and snATAC-seq data of 336,000 nuclei), as well as multi-omics data of 652 lipids, 731 metabolites, and 1,470 proteins;
* Wet experiment data: 10 cloning tasks (covering Golden Gate, Gibson and other methods), as well as CRISPR vector construction experiments targeting the B2M gene, used to verify the experimental plan designed by Biomni.
Model architecture: dual-engine design and intelligent collaboration mechanism
Biomni consists of two main components: Biomni-E1, a base biomedical environment with a unified action space, and Biomni-A1, an agent designed to effectively utilize this environment.
* Biomni-E1 is not a simple collection of tools, but a structured "digital laboratory" whose design must follow the three principles of authenticity, flexibility, and scalability. That is, all tools, software, and databases must be verified by experts; the software is deployed in a containerized manner, supports version switching, and database queries support natural language input; reserved interfaces support the addition of new tools.
* Biomni-A1 is the "decision-making center" of general intelligence. Its architecture breaks through the "input-output" model of traditional AI. It has a problem-solving process similar to that of human scientists and performs dynamic tool selection based on retrieval-enhanced planning. It uses code as a universal interface and supports complex logic such as loops, parallelism, and conditional judgments. It supports adaptive planning, and the initial plan is generated based on knowledge and can be adjusted according to feedback during execution.

Experimental conclusion: Excellent performance from benchmark testing to wet lab verification
Biomni’s performance was verified through multi-level experiments, and its results not only demonstrated technological breakthroughs, but also revealed the practical value of general biomedical AI.
In standardized benchmark tests, Biomni demonstrated significant advantages:
* In the LAB-Bench test,The accuracy of database question answering (DbQA) reached 74.4%, which is comparable to human experts (74.7%) and far exceeds the coding agent (40.8%); the accuracy of sequence reasoning (SeqQA) reached 81.9%, exceeding the human level (78.8%), indicating that its ability to process structured data and biological sequences is close to that of professional researchers.
* In the HLE test,52 questions covering 14 fields were evaluated, with an accuracy of 17.3%, which is 2.9 times that of the basic LLM (6.0%) and 1.3 times that of the encoding agent (12.8%). It is worth noting that HLE has no development set tuning and fully tests the zero-sample generalization ability. The results show that Biomni can handle unseen cross-domain problems.
In addition, in 8 real-world tasks,Biomni's average performance far exceeds the baseline: 402.3% higher than the basic LLM, 43.0% higher than the encoding agent, and 20.4% higher than the variant that only uses ReAct chain reasoning (Biomni-ReAct). In the segmentation task, the accuracy of GWAS causal gene detection reached 68.3% (the average of human experts is 71.2%), the semantic matching rate of single-cell annotations was 89.7%, and the clinical alignment score of drug repositioning was 0.78 (full score 1.0).

In short, Biomni breaks the limitation of traditional AI in the biomedical field that "special people for special posts" and realizes the full process of autonomous operation from gene regulatory network analysis to wet experiment design. This is not only an innovation at the technical level, but also a vision of a future in which virtual AI biologists work side by side with human scientists and enhance the capabilities of human scientists.
at present,"Biomni: The First Universal Biomedical Intelligent Agent" has been launched on the "Tutorials" section of HyperAI's official website (hyper.ai).One-click deployment allows you to experience it online. Simply enter the biomedical task instructions to start the automated analysis process. Come and experience it!Tutorial Link:
We have also prepared surprise benefits for new registered users. Use the invitation code "Biomni" to register on the OpenBayes platform to get 5 hours of free use of RTX A6000 (the resource is valid for 1 month). The quantity is limited, first come first served!
Demo Run
1. After entering the hyper.ai homepage, select the "Tutorials" page, select "Biomni: The First Universal Biomedical Agent", and click "Run this tutorial online".


2. After the page jumps, click "Clone" in the upper right corner to clone the tutorial into your own container.

3. Select "NVIDIA RTX A6000" and "vllm" image. The OpenBayes platform provides 4 billing methods. You can choose "pay as you go" or "daily/weekly/monthly" according to your needs. Click "Continue". New users can register using the invitation link below to get 4 hours of RTX 4090 + 5 hours of CPU free time!
HyperAI exclusive invitation link (copy and open in browser):
https://openbayes.com/console/signup?r=Ada0322_NR0n


4. Wait for resources to be allocated. The first clone will take about 2 minutes. When the status changes to "Running", click "Open Workspace" to jump to the Demo page.

5. Double-click the project name in the left directory bar to start using it. Run to "3. Perform biomedical tasks using natural language" and enter the prompt.

Effect Demonstration
Example prompt: Plan a CRISPR screen to identify genes that regulate T cell exhaustion, generate 32 genes that maximize the perturbation effect.
The effect is as follows:


The above is the tutorial recommended by HyperAI. Interested readers are welcome to experience it ⬇️
Tutorial Link:
https://go.hyper.ai/Mox9F