Seize the Golden Period of "AI + Biomedicine" and Take Stock of the Most Noteworthy Disruptive Achievements in 2024

In the past year, AI has set off a wave of change around the world, especially in the biomedical field.
AI systems represented by AlphaFold can predict the three-dimensional structure of proteins with unprecedented accuracy, providing revolutionary tools for understanding protein functions and developing targeted drugs. In the field of drug research and development, AI can not only predict drug properties based on massive drug data, but also design new drugs and shorten the drug research and development cycle from laboratory to clinic. At the same time, AI can also accurately mine information from massive gene sequencing data, quickly identify gene mutations, and help researchers identify disease-related gene mutations. In addition, AI can also optimize the cell differentiation process and promote the development of large cell models...
With the 2024 Nobel Prize in Chemistry being awarded to the fields of computational protein design and protein structure prediction, the revolutionary role of AI in the biomedicine field has once again been recognized globally.
In this article, HyperAI focuses on the latest research on AI in the field of biomedicine, and selects 46 cutting-edge papers interpreted between 2023 and 2024 for readers.These papers cover internationally renowned conferences/journals such as CVPR 2024, ICLM 2024, ACL 2024, Nature, etc., and the research units are spread across top universities and institutions at home and abroad, including Microsoft Research, DeepMind, Massachusetts Institute of Technology, University of California, Chinese Academy of Sciences, Tsinghua University, Fudan University, Peking University, Zhejiang University, Shanghai Jiaotong University, Shanghai Artificial Intelligence Laboratory, etc.
Click on the paper title or Chinese interpretation below to jump to the paper interpretation page. I hope it will be helpful to you.
For more details on the latest achievements of AI+biomedicine, please see:
https://github.com/hyperai/awesome-ai4s
01
Paper title:Accurate de novo design of high-affinity protein binding macrocycles using deep learning, 2024.11

Chinese interpretation:David Baker's latest achievement! De novo design of macrocyclic peptide binder framework RFpeptides, providing new possibilities for undruggable proteins
Research content:David Baker's team has developed a new diffusion model-based technology, RFpeptides, which is specifically designed for high-affinity macrocyclic binders for a variety of protein targets.
02
Paper title:BioCLIP: A Vision Foundation Model for the Tree of Life, 2024.02

Chinese interpretation:CVPR Best Student Paper! A large dataset of 10 million images and 450,000+ species, the multimodal model BioCLIP achieves zero-shot learning
Research content:Ohio State University, Microsoft Research, University of California, Irvine, Rensselaer Polytechnic Institute, and others have released TreeOfLife-10M, the largest and most diverse biological image dataset suitable for machine learning to date, and developed BioCLIP, a basic model of the tree of life. This model makes full use of the diverse biological images of plants, animals, and fungi in TreeOfLife-10M, and its performance is significantly better than existing methods in a variety of fine-grained biological classification tasks.
03
Paper title:Y-Mol: A Multiscale Biomedical Knowledge-Guided Large Language Model for Drug Development, 2024.10

Chinese interpretation:First! Four universities jointly launched the drug development language model Y-Mol, which is ahead of LLaMA2 in performance
Research content:Research teams from Hunan University, Central South University, Hunan Normal University, and Xiangtan University jointly proposed a large language model Y-Mol guided by multi-scale biomedical knowledge, which can be fine-tuned on different text corpora and instructions, enhancing the model's performance and potential in drug research and development.
04

Research content:The Institute of Synthesis at the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, combined automation and the ProEnsemble machine learning framework to overcome the technical barriers of evolutionary uncertainty in metabolic pathways and achieved the leap from laboratory to industrial-scale production of naringenin. Its universal chassis can successfully synthesize high-yield flavonoid compounds.
05
Paper title:Deep Learning-Assisted Automated Multidimensional Single Particle Tracking in Living Cells, 2024.03

Chinese interpretation:Single particle tracking at the nanoscale, Fang Ning's team at Xiamen University uses AI to play "Rock in the Cell"
Research content:Based on deep learning, Professor Fang Ning's team at Xiamen University has developed an automated, high-speed, multi-dimensional single-particle tracking (SPT) system, breaking the limitations of nanoparticle rotation tracking in cellular microenvironments.
06
Paper title:AlphaFold Meets Flow Matching for Generating Protein Ensembles, 2024.06

Chinese interpretation:Selected for ICML! MIT team achieves new breakthrough based on AlphaFold, revealing the dynamic diversity of proteins
Research content:The MIT research team selected AlphaFold and ESMFold and fine-tuned them under a custom flow matching framework to obtain sequence-conditional protein structure generation models, called AlphaFLOW and ESMFLOW.
07
Paper title:ProSST: Protein Language Modeling with Quantized Structure and Disentangled Attention, 2024.05

Chinese interpretation:Major breakthrough in PLM! The latest achievements of Shanghai Jiaotong University and Shanghai AI Lab were selected for NeurIPS 24, ProSST effectively integrates protein structure information
Research content:A team from Shanghai Jiao Tong University has developed a pre-trained protein language model ProSST with structure-aware capabilities, which can effectively integrate protein structure and amino acid sequence information, and outperform existing models in tasks such as thermal stability prediction, metal ion binding prediction, protein localization prediction, and GO annotation prediction.
08
Paper title:Cytochrome P450 Enzyme Design by Constraining the Catalytic Pocket in a Diffusion Model, 2024.07

Chinese interpretation:The catalytic capacity is increased by 3.5 times! The Chinese Academy of Sciences team developed a P450 enzyme de novo design method based on the diffusion model P450Diffusion
Research content:The new enzyme design team of Tianjin Institute of Industrial Biotechnology, Chinese Academy of Sciences, has developed a P450Diffusion method for de novo design of P450 enzymes based on diffusion model and pocket design principles.
09
Paper title:DePLM: Denoising Protein Language Models for Property Optimization, 2024.11

Chinese interpretation:Selected for NeurIPS 24! Zhejiang University team proposed a new denoising protein language model DePLM, which predicts mutation effects better than SOTA models
Research content:The Zhejiang University team proposed a new denoising protein language model (DePLM) optimized for proteins. The evolutionary information captured by the protein language model can be regarded as a mixture of feature-related and irrelevant information, where irrelevant information is regarded as "noise" and eliminated. The model has strong generalization ability.
10
Paper title:EquiPocket: an E(3)-Equivariant Geometric Graph Neural Network for Ligand Binding Site Prediction, 2024.07

Chinese interpretation:Selected for ICML! The Renmin University team used equivariant graph neural network to predict target protein binding sites, with the highest performance improvement of 20%
Research content:A research team from the Gaoling School of Artificial Intelligence at Renmin University of China applied the E(3) equivariant graph neural network (GNN) to ligand binding site prediction for the first time and proposed the EquiPocket framework, which is helpful for various downstream tasks such as drug discovery.
11

Chinese interpretation:Realize protein dynamic docking prediction! Shanghai Jiaotong University/Xingyao Technology/Sun Yat-sen University and others jointly launched the geometric deep generative model DynamicBind
Research content:Shanghai Jiao Tong University, in collaboration with Star Pharma Technology, Sun Yat-sen University School of Pharmacy and Rice University in the United States, proposed a geometric deep generative model DynamicBind designed for protein "dynamic docking". This method was verified by wet experiments in the international drug screening competition CACHE and can screen out competitive lead compounds for difficult-to-drug targets for the treatment of Parkinson's disease.
12

Chinese interpretation:Korean version of AlphaFold? Deep learning model AlphaPPIMd: for protein-protein complex conformation ensemble exploration
Research content:Yonsei University and its collaborators combined deep learning with generative AI to construct the AlphaPPIMd model, which revealed the mysteries of protein interactions through molecular dynamics simulations.
13
Paper title:UniIF: Unified Molecule Inverse Folding, 2024.05

Chinese interpretation:Selected for NeurIPS 2024! Westlake University proposed the universal molecular inverse folding model UniIF, which further complements AlphaFold 3
Research content:A team from the Future Industries Research Center of Westlake University proposed the UniIF model for the inverse folding of all molecules, which has achieved state-of-the-art performance in multiple tasks such as protein design, RNA design, and material design.
14

Research content:A team from Shanghai Jiao Tong University designed a diffusion probability model framework, CPDiffusion, which can learn the implicit mapping relationship between protein sequence, structure and function at very low training cost and data cost, thereby generating diverse protein sequences.
15
Paper title:ProtT3: Protein-to-Text Generation for Text-based Protein Understanding, 2023.05

Chinese interpretation:Selected for ACL 2024! To achieve cross-modal interpretation of protein data and text information, Wang Xiang's team from USTC proposed a protein-text generation framework ProtT3
Research content:The University of Science and Technology of China, in collaboration with the National University of Singapore and Hokkaido University, proposed a new protein-text modeling framework ProtT3. This framework combines the modality-different PLM and LM through a cross-modal projector, and has achieved excellent performance in protein subtitles, protein question-answering, and protein-text retrieval tasks.
16
Paper title:InstructProtein: Aligning Human and Protein Language via Knowledge Instruction, 2023.10

Chinese interpretation:Selected for ACL2024 Main Conference | InstructProtein: Aligning protein language with human language using knowledge instructions
Research content:A research team from Zhejiang University proposed InstructProtein, which uses knowledge instructions to align protein language with human language, demonstrating the ability to integrate biological sequences into large language models.
17
Paper title:ESM All-Atom: Multi-scale Protein Language Model for Unified Molecular Modeling, 2024.06

Chinese interpretation:Selected for ICML, Tsinghua AIR and others jointly released the protein language model ESM-AA, surpassing the traditional SOTA
Research content:A joint research team from Tsinghua University, Peking University and Nanjing University proposed a multi-scale protein language model ESM-AA, which significantly improved its performance in tasks such as target-ligand binding.
18
Paper title:Sequence modeling and design from molecular to genome scale with Evo, 2024.11

Chinese interpretation:Be the first to experience the demo! The genome-based model Evo is on the cover of Science, achieving prediction and generation from molecular to genomic scales
Research content:The Evo model can predict, generate and design genome sequences, and is expected to be applied in gene editing, drug discovery, disease diagnosis, agriculture and other fields. The HyperAI Super Neural Tutorial section is now online "Evo: Prediction and Generation from Molecular to Genome Scale", which can be quickly experienced by cloning with one click!
19
Paper title:Large-scale foundation model on single-cell transcriptomics, 2024.06

Chinese interpretation:A cell model with 100 million parameters is here! In a Nature journal, a Tsinghua University team released scFoundation: Simultaneous modeling of 20,000 genes
Research content:The Life Foundation Model Laboratory of the Department of Automation and the Department of Electronics/AIR of Tsinghua University have collaborated in research to build a large scFoundation cell model with 100 million parameters, which can process approximately 20,000 genes simultaneously and has shown significant performance improvements in tasks such as cell sequencing depth enhancement, cell drug response prediction, and cell perturbation prediction.
20
Paper title:Enhancing efficiency of protein language models with minimal wet-lab data through few-shot learning, 2024.07

Chinese interpretation:20 experimental data create AI protein milestone! Shanghai Jiaotong University and Shanghai AI Lab jointly released FSFP to effectively optimize protein pre-training models
Research content:Shanghai Jiao Tong University, in collaboration with the Shanghai Artificial Intelligence Laboratory, proposed a fine-tuning training method, FSFP, based on a protein pre-training model. This method can efficiently train the protein pre-training model using only 20 random wet experimental data, and significantly improve the model's single-point mutation prediction positivity rate.
21
Paper title:Protein Engineering with Lightweight Graph Denoising Neural Networks, 2024.04

Chinese interpretation:Without experimental data to guide protein directed evolution, the research group of Shanghai Jiaotong University published the microenvironment-aware graph neural network ProtLGN
Research content:Shanghai Jiao Tong University has developed a microenvironment-aware graph neural network called P(ROT)LGN, which can learn and predict beneficial amino acid mutation sites from the three-dimensional structure of proteins, and guide the design of single-site mutations and multi-site mutations in white matter with different functions.
22

Chinese interpretation:Published in Cell Journal! Research group led by Zhang Qiangfeng from Tsinghua University developed SPACE algorithm, which has the best organizational module discovery capability among similar tools
Research content:The School of Life Sciences of Tsinghua University/Advanced Innovation Center for Structural Biology/Tsinghua-Peking University Joint Center for Life Sciences has developed an artificial intelligence algorithm SPACE based on the graph autoencoder deep learning framework, which can identify spatial cell types and discover tissue modules from spatial transcriptome data with single-cell resolution.
23
Paper title:Deep Learning Empowers the Discovery of Self-AssemblingPeptides with Over 10 Trillion Sequences, 2023.09

Chinese interpretation:Westlake University uses Transformer to analyze the self-assembly characteristics of billions of peptides and crack the self-assembly rules
Research content:The Westlake University team used a Transformer-based regression network to predict the self-assembly properties of tens of billions of peptides and analyzed the effects of amino acids at different positions on the self-assembly properties, providing a powerful new tool for the study of self-assembling peptides.
24
Paper title:IMN4NPD: An Integrated Molecular Networking Workflow for Natural Product Dereplication, 2024.02

Chinese interpretation:To fully explore the active ingredients of natural medicines, Professor Liu Shao's team from Central South University built the IMN4NPD platform
Research content:The team from Central South University built the IMN4NPD platform by integrating two different molecular networks, which can be used to comprehensively explore the trace and structure-specific active ingredients of natural medicines.
25
Paper title:AlphaProteo generates novel proteins for biology and health research, 2024.09

Chinese interpretation:DeepMind's new results are criticized as advertisements? AlphaProteo can efficiently design target protein binders with 300 times higher affinity
Research content:DeepMind released AlphaProteo for novel protein design, which can generate "ready-to-use" protein binders through only one round of medium-throughput screening without further optimization.
26
Paper title:Fast, sensitive detection of protein homologs using deep dense retrieval, 2024.08

Chinese interpretation:Sensitivity improved by 56%, CUHK/Fudan/Yale and others jointly proposed a new protein homolog detection method
Research content:The Chinese University of Hong Kong, in collaboration with the Laboratory of Intelligent Complex Systems of Fudan University, the Shanghai Artificial Intelligence Laboratory, and Yale University, proposed an ultra-fast and highly sensitive protein homolog detection framework.
27
Paper title:Generating All-Atom Protein Structure from Sequence-Only Training Data, 2024.12

Chinese interpretation:LeCun forwarded, UC Berkeley et al. proposed a multimodal protein generation method PLAID, which generates sequences and all-atom protein structures at the same time
Research content:The University of California, Berkeley, Microsoft Research, and others proposed a multimodal protein generation method PLAID, which can achieve multimodal generation by generating scarcer modalities (such as crystal structures) from richer data modalities (such as sequences).
28
Paper title:Accurate proteome-wide missense variant effect prediction with AlphaMissense, 2023.09

Chinese interpretation:DeepMind uses unsupervised learning to develop AlphaMissense, predicting 71 million gene mutations
Research content:DeepMind developed AlphaMissense and predicted 71 million possible gene missense mutations in humans, finding that 32% may be a pathogenic mutation and 57% may be a benign mutation. These results will greatly promote the development of molecular biology, genomics, clinical medicine and other disciplines.
29

Chinese interpretation:Can inhibit cancer cell proliferation! Huihu School of Pharmacy and Tianjin Medical University jointly developed a new tumor suppressor protein degrader dp53m
Research content:The Huihu School of Pharmacy of Xi'an Jiaotong-Liverpool University, in collaboration with Tianjin Medical University General Hospital, has developed a selective p53-R175H degrader - dp53m, which can specifically recognize the mutant p53-R175H protein, achieve targeted degradation of the target protein, and inhibit the functional expression of the mutant p53 protein.
30

Chinese interpretation:Yu Xiang's research group at Shanghai Jiao Tong University released a transferable deep learning model to identify multiple types of RNA modifications and significantly reduce computational costs
Research content:Shanghai Jiao Tong University, in collaboration with the Shanghai Chenshan Botanical Garden team, developed a transferable deep learning model, TandemMod, that enables the identification of multiple types of RNA modifications in direct RNA sequencing (DRS).
31
Paper title:Drug repositioning with adaptive graph convolutional networks, 2024.01

Chinese interpretation:New uses for old drugs: AdaDR released by the Central South University team for drug repositioning based on adaptive graph convolutional networks
Research content:The research team of Central South University proposed an adaptive GCN method called AdaDR, which performs drug repositioning by deeply integrating node features and topological structures.
32
Paper title:Generative AI for designing and validating easily synthesizable and structurally novel antibiotics, 2024.03

Chinese interpretation:Good news for patients with drug-resistant bacterial infections! McMaster and Stanford University team up to develop new antibiotics using generative AI
Research content:Researchers from McMaster University and Stanford University have developed a generative AI model, SyntheMol, that can design new compounds that are easy to synthesize based on the chemical space of nearly 30 billion molecules.
33
Paper title:Viruslmmu: a novel ensemble machine learning approach for viral immunogenicity prediction, 2023.11

Chinese interpretation:New breakthrough in vaccine research and development: Beihang team proposes a new method for predicting viral antigen immunogenicity, VirusImmu
Research content:A team from Beihang University has developed a machine learning ensemble method (Viruslmmu) for predicting the immunogenicity of viral antigens, which shows great potential in predicting the immunogenicity of viral protein fragments, providing a tool for vaccine developers.
34
Paper title:UniKP: a unified framework for the prediction of enzyme kinetic parameters, 2023.12

Chinese interpretation:Luo Xiaozhou's team from the Chinese Academy of Sciences proposed the UniKP framework, a large model + machine learning to predict enzyme kinetic parameters with high precision
Research content:A team from the Shenzhen Institute of Advanced Technology of the Chinese Academy of Sciences proposed an enzyme kinetic parameter prediction framework (UniKP) to predict a variety of different enzyme kinetic parameters.
35
Paper title:Mosaic integration and knowledge transfer of single-cell multimodal data with MIDAS, 2024.01

Chinese interpretation:Independent research and development! The team of the Military Medical Research Institute proposed MIDAS, which can be used for mosaic integration of single-cell multi-omics data
Research content:A team from the Academy of Military Medical Sciences proposed a computational tool, MIDAS, for mosaic integration of single-cell multi-omics data and knowledge transfer. It realized for the first time the general integration functions of single-cell multi-omics mosaic data, such as modal alignment, data completion, and batch correction.
36
Paper title:ResGen is a pocket-aware 3D molecular generation model based on parallel multiscale modeling, 2023.09

Chinese interpretation:8 times faster than the best technology: Hou Tingjun et al. from Zhejiang University proposed ResGen, a 3D molecular generation model based on protein pockets
Research content:The research team of Zhejiang University and Zhijiang Laboratory proposed a 3D molecular generation model based on protein pockets - ResGen, which is 8 times faster than the previous best technology and successfully generated drug-like molecules with lower binding energy and higher diversity.
37
Paper title:A principal odor map unifies diverse tasks in olfactory perception, 2023.08

Chinese interpretation:Google develops odor recognition AI based on GNN, which is equivalent to 70 years of continuous work by human evaluators
Research content:Osmo, a branch of Google Research, has developed an odor analysis AI based on graph neural networks. It can describe the smell of a chemical molecule based on its structure, and is superior to humans in judging 53% of chemical molecules and 55% of odor descriptors.
38
Paper title:Machine learning enhances prediction of plants as potential sources of antimalarials, 2023.05

Chinese interpretation:Kew Gardens, UK, uses machine learning to predict plant resistance to malaria, increasing accuracy from 0.46 to 0.67
Research content:Researchers from the Royal Botanic Gardens, Kew and the University of St Andrews have demonstrated that machine learning algorithms can effectively predict plant antimalarial properties with an accuracy rate of 0.67, a significant improvement over the 0.46 of traditional testing methods.
39
Paper title:Machine learning models to accelerate the design of polymeric long-acting injectables, 2023.01

Chinese interpretation:Comparing 11 algorithms horizontally, the University of Toronto launched a machine learning model to accelerate the development of new long-acting injectable drugs
Research content:Researchers at the University of Toronto have developed a machine learning model that can predict the release rate of long-acting injectable drugs, speeding up the overall drug development process.
40

Chinese interpretation:Li Honglin's research group at East China University of Science and Technology develops Macformer to accelerate the discovery of macrocyclic drugs
Research content:A team from East China University of Science and Technology developed Macformer based on Transformer, and successfully macrocyclized the acyclic drug feizotinib to obtain new compounds with stronger efficacy, providing a new method for drug development.
41

Chinese interpretation:Peking University develops a pluripotent stem cell differentiation system based on machine learning to efficiently and stably prepare functional cells
Research content:A team from Peking University and Beijing Jiaotong University has developed a differentiation system based on living cell bright-field dynamic imaging and machine learning, which can intelligently regulate and optimize the differentiation process of pluripotent stem cells in real time, achieving efficient and stable production of functional cells.
42
Paper title:Predicting pharmaceutical inkjet printing outcomes using machine learning, 2023.12

Chinese interpretation:New breakthrough in drug 3D printing: University of San Diego uses machine learning to screen inkjet printing bio-inks with an accuracy rate of up to 97.22%
Research content:Researchers from the University of Santiago de Compostela and University College London applied machine learning models to predict bioink printability and successfully improved the prediction rate.
43
Paper title:Deep learning-guided discovery of an antibiotic targeting Acinetobacter baumannii, 2023.05

Chinese interpretation:AI fights superbugs: McMaster University uses deep learning to discover a new antibiotic abaucin
Research content:Researchers from McMaster University and MIT used deep learning to screen approximately 7,500 molecules to identify new antibiotics that inhibit Acinetobacter baumannii.
44
Paper title:Discovery of Senolytics using machine learning, 2023.05

Chinese interpretation:To prevent cell aging and stay away from age-related diseases, the University of Edinburgh has issued three "AI anti-aging prescriptions" for cells
Research content:The University of Edinburgh and the University of Cantabria used machine learning to discover three anti-aging drugs - Ginkgetin, Periplocin and Oleandrin, and verified their anti-aging effects in human cell lines.
45
Paper title:Rules and mechanisms governing G protein coupling selectivity of GPCRs, 2023.09

Chinese interpretation:University of Florida uses neural networks to decipher GPCR-G protein coupling selectivity
Research content:Researchers at the University of Florida determined the binding selectivity of GPCRs and G proteins, developed an algorithm to predict the selectivity of the two, and studied the structural basis of this selectivity.
46
Paper title:Discovery of a structural class of antibiotics with explainable deep learning, 2023.12

Chinese interpretation:The curse of "super bacteria" may be broken. MIT uses deep learning to discover new antibiotics
Research content:MIT researchers used the graph neural network Chemprop to identify potential antibiotics from a large chemical library and discovered a new class of antibiotics.
The above are the cutting-edge papers on AI+biomedicine summarized in this issue. For more latest results, please see:
https://github.com/hyperai/awesome-ai4s
