Abstract:Nuclear Magnetic Resonance (NMR) spectroscopy is one of the major techniques in structural biology with over 11800 protein structures deposited in the Protein Data Bank. NMR can elucidate structures and dynamics of small and medium size proteins in solution, living cells, and solids, but has been limited by the tedious data analysis process. It typically requires weeks or months of manual work of trained expert to turn NMR measurements into a protein structure. Automation of this process is an open problem, formulated in the field over 30 years ago. Here, we present the first approach that addresses this challenge. Our method, ARTINA, uses as input only NMR spectra and the protein sequence, delivering a structure strictly without any human intervention. Tested on a 100-protein benchmark (1329 2D/3D/4D NMR spectra), ARTINA demonstrated its ability to solve structures with 1.44 {\AA} median RMSD to the PDB reference and 91.36% correct NMR resonance assignments. ARTINA can be used by non-experts, reducing the effort for a protein structure determination by NMR essentially to the preparation of the sample and the spectra measurements.
Abstract:Nuclear magnetic resonance (NMR) spectroscopy is one of the leading techniques for protein studies. The method features a number of properties, allowing to explain macromolecular interactions mechanistically and resolve structures with atomic resolution. However, due to laborious data analysis, a full potential of NMR spectroscopy remains unexploited. Here we present an approach aiming at automation of two major bottlenecks in the analysis pipeline, namely, peak picking and chemical shift assignment. Our approach combines deep learning, non-parametric models and combinatorial optimization, and is able to detect signals of interest in a multidimensional NMR data with high accuracy and match them with atoms in medium-length protein sequences, which is a preliminary step to solve protein spatial structure.