HyperAIHyperAI

Command Palette

Search for a command to run...

MADGEN: Mass-Spec attends to De Novo Molecular generation

Yinkai Wang Xiaohui Chen Liping Liu Soha Hassoun*

Abstract

The annotation (assigning structural chemical identities) of MS/MS spectraremains a significant challenge due to the enormous molecular diversity inbiological samples and the limited scope of reference databases. Currently, thevast majority of spectral measurements remain in the "dark chemical space"without structural annotations. To improve annotation, we propose MADGEN(Mass-spec Attends to De Novo Molecular GENeration), a scaffold-based methodfor de novo molecular structure generation guided by mass spectrometry data.MADGEN operates in two stages: scaffold retrieval and spectra-conditionedmolecular generation starting with the scaffold. In the first stage, given anMS/MS spectrum, we formulate scaffold retrieval as a ranking problem and employcontrastive learning to align mass spectra with candidate molecular scaffolds.In the second stage, starting from the retrieved scaffold, we employ the MS/MSspectrum to guide an attention-based generative model to generate the finalmolecule. Our approach constrains the molecular generation search space,reducing its complexity and improving generation accuracy. We evaluate MADGENon three datasets (NIST23, CANOPUS, and MassSpecGym) and evaluate MADGEN'sperformance with a predictive scaffold retriever and with an oracle retriever.We demonstrate the effectiveness of using attention to integrate spectralinformation throughout the generation process to achieve strong results withthe oracle retriever.


Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing

HyperAI Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp