Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

David Healey

Efficiently predicting high resolution mass spectra with graph neural networks

Jan 26, 2023

Michael Murphy, Stefanie Jegelka, Ernest Fraenkel, Tobias Kind, David Healey, Thomas Butler

Figure 1 for Efficiently predicting high resolution mass spectra with graph neural networks

Figure 2 for Efficiently predicting high resolution mass spectra with graph neural networks

Figure 3 for Efficiently predicting high resolution mass spectra with graph neural networks

Figure 4 for Efficiently predicting high resolution mass spectra with graph neural networks

Abstract:Identifying a small molecule from its mass spectrum is the primary open problem in computational metabolomics. This is typically cast as information retrieval: an unknown spectrum is matched against spectra predicted computationally from a large database of chemical structures. However, current approaches to spectrum prediction model the output space in ways that force a tradeoff between capturing high resolution mass information and tractable learning. We resolve this tradeoff by casting spectrum prediction as a mapping from an input molecular graph to a probability distribution over molecular formulas. We discover that a large corpus of mass spectra can be closely approximated using a fixed vocabulary constituting only 2% of all observed formulas. This enables efficient spectrum prediction using an architecture similar to graph classification - GrAFF-MS - achieving significantly lower prediction error and orders-of-magnitude faster runtime than state-of-the-art methods.

Via

Access Paper or Ask Questions

Multi-scale Sinusoidal Embeddings Enable Learning on High Resolution Mass Spectrometry Data

Jul 06, 2022

Gennady Voronov, Rose Lightheart, Joe Davison, Christoph A. Krettler, David Healey, Thomas Butler

Figure 1 for Multi-scale Sinusoidal Embeddings Enable Learning on High Resolution Mass Spectrometry Data

Figure 2 for Multi-scale Sinusoidal Embeddings Enable Learning on High Resolution Mass Spectrometry Data

Figure 3 for Multi-scale Sinusoidal Embeddings Enable Learning on High Resolution Mass Spectrometry Data

Figure 4 for Multi-scale Sinusoidal Embeddings Enable Learning on High Resolution Mass Spectrometry Data

Abstract:Small molecules in biological samples are studied to provide information about disease states, environmental toxins, natural product drug discovery, and many other applications. The primary window into the composition of small molecule mixtures is tandem mass spectrometry (MS2), which produces data that are of high sensitivity and part per million resolution. We adopt multi-scale sinusoidal embeddings of the mass data in MS2 designed to meet the challenge of learning from the full resolution of MS2 data. Using these embeddings, we provide a new state of the art model for spectral library search, the standard task for initial evaluation of MS2 data. We also introduce a new task, chemical property prediction from MS2 data, that has natural applications in high-throughput MS2 experiments and show that an average $R^2$ of 80\% for novel compounds can be achieved across 10 chemical properties prioritized by medicinal chemists. We use dimensionality reduction techniques and experiments with different floating point resolutions to show the essential role multi-scale sinusoidal embeddings play in learning from MS2 data.

Via

Access Paper or Ask Questions