Selected for ICML 2025, Meta/Cambridge/MIT Proposed the All-atom Diffusion Transformer Framework, Which Realizes the Unified Generation of Periodic and Non-periodic Atomic Systems for the First Time

At the forefront of today's scientific research and industrial applications, generative modeling of the three-dimensional structure of atomic systems is showing disruptive potential and is expected to completely reshape the reverse design landscape of new molecules and materials. From precise structural prediction to flexible conditional generation, the most advanced diffusion models and flow matching models have emerged in key tasks such as biomolecule analysis, new material research and development, and structure-based drug design, becoming the core tool for researchers to break through technical bottlenecks.
However, behind this booming field,A key problem has always restricted technological leaps - the existing models lack cross-system universality.Although all atomic systems follow the same physical principles to determine their three-dimensional structures and interactions, the modeling of small molecules, biomolecules, crystals and their composite systems has long been in a state of "divide and conquer". Most diffusion models are highly dependent on the inherent characteristics of a specific system and require multimodal generation on a complex product manifold where categorical data (such as atom type) and continuous data (such as three-dimensional coordinates) are intertwined, which makes models between different systems difficult to be compatible.
Take a specific scenario as an example: the de novo generation of small molecules needs to be split into two independent diffusion processes: atomic type (classification) and three-dimensional coordinates (continuous). Although the denoising model needs to learn the co-evolution laws of the two, it often reduces the sampling efficiency due to the distortion of the intermediate state; the modeling of biological molecules requires the additional introduction of rotating manifolds and treats the atomic groups as rigid bodies; and the diffusion process of crystals and materials must be compatible with periodic characteristics and run on a joint manifold composed of multi-dimensional parameters such as atomic type, fractional coordinates, and lattice parameters - these differences make cross-system unified modeling a long-standing unresolved challenge in the field.
In this context,A joint research team from Meta Basic Artificial Intelligence Research (FAIR), the University of Cambridge and the Massachusetts Institute of Technology proposed a breakthrough solution - All-atom Diffusion Transformer (ADiT).
As a unified latent diffusion framework based on Transformer,The core advantage of ADiT is that it breaks the modeling barrier between periodic and non-periodic systems. Through two major innovations, all-atom unified latent representation and Transformer latent diffusion, it realizes the generation of molecules and crystals with a single model.Its design introduces almost no inductive bias, making the autoencoder and diffusion model far more efficient than the traditional equivariant diffusion model in training and reasoning - under the same hardware conditions, the time to generate 10,000 samples is reduced from 2.5 hours to less than 20 minutes. More importantly, when the model parameters are expanded to 500 million, its performance shows a predictable linear improvement. This feature lays a key foundation for building a universal basic model of generative chemistry, marking a milestone in the universality and large-scale application of atomic system modeling.
The related research results were selected for ICML 2025 under the title "All-atom Diffusion Transformers: Unified generative modelling of molecules and materials".
Research highlights:
* ADiT is the first to unify the generative models for periodic materials and non-periodic molecular systems
* ADiT relies on all-atom unified latent representation and uses Transformer for latent diffusion, which effectively simplifies the generation process and has almost no inductive bias
* ADiT has excellent scalability and efficiency, and its training and inference speeds far exceed those of the equivariant diffusion model

Paper address:
More AI frontier papers:
https://go.hyper.ai/owxf6
Dataset: From periodic to non-periodic, covering experimental data in multiple fields
In this study, the research team first selected several representative data sets to conduct experiments:
* MP20 dataset,Contains 45,231 metastable crystal structures from the Materials Project, with a maximum of 20 atoms in a unit cell, covering 89 different elements, which can well represent periodic material systems;
* QM9 dataset,It is composed of 130,000 stable small organic molecules, containing up to 9 heavy atoms (C, N, O, F) and hydrogen atoms, and is a typical representative of non-periodic molecular systems;
* GEOM-DRUGS dataset,Large organic molecules containing 430,000 atoms of up to 180;
* QMOF dataset,Contains 14,000 metal-organic framework structures.
in,MP20 and QM9 correspond to different types of atomic systems.It provides a basis for the joint training of the model on periodic and non-periodic systems, and the research team divided the data according to the method of previous research to ensure fairness in comparison with other models; GEOM-DRUGS and QMOF further expand the scope of model testing and can more comprehensively test the generalization ability of the model.
ADiT: Building a unified atomic system generation model based on dual core ideas
As a latent diffusion model, the core design of ADiT revolves around two key ideas to achieve unified generative modeling of periodic and non-periodic atomic systems.
The first key idea is the all-atom unified latent representation,The research team viewed both periodic and non-periodic atomic systems as a collection of atoms in three-dimensional space, and then developed a unified representation that includes the categorical attributes of each atom (such as the atom type) and the continuous attributes (such as the three-dimensional coordinates). By training a variational autoencoder (VAE) for full atom reconstruction,The encoder is able to embed molecules and crystals into a shared latent space.This provides a basic framework for the unified treatment of different types of atomic systems.
The second key idea is to use Transformer for latent diffusion.In the latent space constructed by the VAE encoder, the research team introduced the Diffusion Transformer (DiT) to carry out generative modeling. During the inference process,With the help of classifier-free bootstrapping techniques, new latent variables can be sampled.These latent variables can be reconstructed into valid molecules or crystals through the VAE decoder, thus completing the transformation from latent space to actual atomic system.
Based on these two core ideas, ADiT's experimental method is divided into two stages and proceeds in an orderly manner.
In the first stage, the researchers built an autoencoder for reconstruction.Through VAE, the full atomic representation of molecules and materials is jointly reconstructed to learn and construct a shared latent space - this is the prerequisite for unified modeling of different atomic systems and lays the foundation for the subsequent generation process.
In the second stage, the researchers constructed a latent diffusion generative model.DiT is used to generate new samples from the latent space, which are decoded into valid molecules or crystals without classifier guidance. The significant advantage of this latent diffusion design is that the complexity of processing classification and continuous attributes is transferred to the autoencoder, making the generation process in the latent space simpler and more scalable, effectively improving the efficiency and adaptability of the model in processing different atomic systems.

ADiT Leading Performance in Crystal and Molecular Generation
In order to fully highlight the performance advantages of ADiT, the research team selected multiple types of baseline models for targeted comparison.In the field of crystal formation,The comparison objects include CDVAE, DiffCSP, FlowMM and other equivariant diffusion and flow matching models based on multimodal product manifolds, as well as the non-equivariant diffusion model UniMat and the two-stage framework FlowLLM;In the field of molecular generation,ADiT is compared with other models such as the Equivariant Diffusion Model, GeoLDM, and Symphony. Through systematic comparison with advanced baseline models in these fields, the performance advantages of ADiT are clearly demonstrated.
From the specific experimental results,ADiT achieves SOTA levels in both crystal and molecule generation tasks.In terms of crystal generation, ADiT-generated crystals performed well in key metrics such as effectiveness, stability, uniqueness, and novelty. In the molecule generation task, ADiT ranked among the top in terms of effectiveness and uniqueness of 10,000 sampled molecules.
ADiT's joint training mechanism also brings significant performance gains. Experimental data showsADiT, trained on both the QM9 and MP20 datasets, outperforms the version trained on only one dataset in both material and molecule generation tasks.
The expansion of model size is predictable for the performance improvement of ADiT. As shown in the figure below, as the number of DiT denoiser parameters increases from 32 million (ADiT-S, blue) to 130 million (ADiT-B, orange), and then to 450 million (ADiT-L, green), even on a medium-sized dataset of about 130,000 samples, the diffusion training loss continues to decrease and the effectiveness ratio steadily increases, showing a significant scale effect. This strong correlation between model size and performance suggests that by expanding model parameters and data volume, it is expected to promote ADiT to achieve further breakthroughs.

In terms of efficiency, ADiT shows a significant speed advantage over the equivariant diffusion model.As shown in the figure below, when generating 10,000 samples on an NVIDIA V100 GPU, the standard Transformer-based ADiT scales much better in terms of integration steps than FlowMM and GeoLDM, which use computationally intensive equivariant networks. Even though ADiT-B has a parameter size 100 times larger than the equivariant baseline, its inference speed is still faster, highlighting the advantage of the Transformer architecture in scalability.

In addition, the scalability of ADiT on larger systems has been demonstrated. On a GEOM-DRUGS molecular dataset containing 430,000 molecules with a maximum of 180 atoms,ADiT performs comparably with state-of-the-art equivariant diffusion and flow matching models in terms of effectiveness and PoseBusters metrics.It is worth noting that ADiT is based on the standard Transformer architecture, introduces almost no molecular inductive bias, and does not require explicit prediction of atomic bonds, but can achieve performance comparable to that of the equivariant model, further demonstrating the versatility and wide applicability of its design.
Industry and research jointly promote breakthrough innovation in the generation of three-dimensional structures of atomic systems
In fact, in the cutting-edge research field of generative modeling of three-dimensional structures of atomic systems, academia and the business community have made unremitting efforts and achieved many remarkable results.
In academia,A research team from the University of California, Berkeley, Microsoft Research, and Genentech has launched a multimodal protein generation method called PLAID.This method cleverly leverages the structural information in pre-trained weights to perform denoising tasks with DiT, demonstrating superior performance over other benchmark methods in the structural quality and diversity analysis of different protein lengths.
The business community is also actively exploring this field, driving development through innovation.China's generative AI protein design innovation company BioGeo has released GeoFlow V2, the world's first all-round protein basic model.A unified atomic diffusion model architecture was built to overcome the protein structure prediction and design tasks in one fell swoop. In terms of antibody and antigen-antibody complex structure prediction, GeoFlow V2 is ahead of similar products with its extraordinary accuracy and speed. Seedance 1.0 launched by ByteDance takes a different approach, using a technical solution combining variational autoencoders and diffusion transformers to achieve fast and efficient AI video generation. Its speed advantage opens up a new situation for real-time creation and interactive applications, indicating that it has broad prospects in the field of commercial applications.
These scientific breakthroughs in academia and innovative practices in the business community are jointly promoting the development of the field of atomic system three-dimensional structure generation modeling. With the continuous advancement of technology, this field will surely play a greater role in many aspects such as new material research and development and drug design, providing strong support for solving global scientific problems and industrial challenges.
Reference articles:
1.https://mp.weixin.qq.com/s/oF3-y7z8u1XpEtjd4q1u4w
2.https://mp.weixin.qq.com/s/tK0-