MIT Researchers Develop AI Model That Predicts Chemical Reactions with Physical Accuracy Using Electron Conservation
A team of researchers at MIT has developed a new generative AI approach to predicting chemical reactions that significantly improves accuracy by embedding fundamental physical laws, such as the conservation of mass and electrons, directly into the model. The method, called FlowER (Flow matching for Electron Redistribution), was published on August 20 in the journal Nature and represents a major step forward in making AI predictions in chemistry both scientifically rigorous and reliable. The work was led by Connor Coley, the Class of 1957 Career Development Professor in MIT’s departments of Chemical Engineering and Electrical Engineering and Computer Science, along with former postdoc Joonyoung Joung, software engineer Mun Hong Fong, graduate student Nicholas Casetti, postdoc Jordan Liles, and undergraduate Ne Dassanayake. Traditional AI models for reaction prediction, including large language models like ChatGPT, often fail to respect core chemical principles. These models treat atoms as tokens and can generate invalid reactions by creating or destroying atoms—something that violates the law of conservation of mass. As Joung explains, this lack of physical grounding makes such predictions resemble “alchemy” rather than science. To solve this, the MIT team adopted a decades-old chemical framework developed by Ivar Ugi that uses a bond-electron matrix to represent electron distribution in molecules. FlowER uses this matrix to track all electrons and bonds throughout a reaction, ensuring that no atoms or electrons are created or lost. This explicit conservation mechanism is a key innovation. The model was trained on over a million chemical reactions sourced from U.S. Patent Office databases. Despite this large dataset, the current version has limitations—it lacks coverage of reactions involving metals and complex catalytic systems. Still, the researchers emphasize that FlowER matches or exceeds existing methods in predicting standard reaction pathways and can generalize to novel reaction types. What sets FlowER apart is not just its accuracy, but its foundation in real experimental data. Rather than inventing mechanisms from scratch, the team infers plausible reaction pathways from validated chemical literature, creating a dataset of mechanistic steps that is now openly available on GitHub. This dataset, developed by Joung, is one of the first of its kind to be shared publicly and widely usable. Coley notes that while the model is still in its early stages, it offers a powerful tool for chemists to explore reactivity, map out reaction mechanisms, and design new synthetic routes. Potential applications span drug discovery, materials science, combustion, atmospheric chemistry, and electrochemistry. Looking ahead, the team plans to expand the model’s capabilities to include metals and catalytic cycles—areas currently underrepresented in the training data. They see FlowER as a foundational step toward AI systems that can not only predict reactions but also help discover entirely new ones and uncover unknown mechanisms. The research was supported by the Machine Learning for Pharmaceutical Discovery and Synthesis consortium and the National Science Foundation. The open-source nature of the project ensures that the model, data, and tools are accessible to the global scientific community, fostering collaboration and accelerating progress in computational chemistry.