Abstract:Cyclic molecules are ubiquitous across applications in chemistry and biology. Their restricted conformational flexibility provides structural pre-organization that is key to their function in drug discovery and catalysis. However, reliably sampling the conformer ensembles of ring systems remains challenging. Here, we introduce PuckerFlow, a generative machine learning model that performs flow matching on the Cremer-Pople space, a low-dimensional internal coordinate system capturing the relevant degrees of freedom of rings. Our approach enables generation of valid closed rings by design and demonstrates strong performance in generating conformers that are both diverse and precise. We show that PuckerFlow outperforms other conformer generation methods on nearly all quantitative metrics and illustrate the potential of PuckerFlow for ring systems relevant to chemical applications, particularly in catalysis and drug discovery. This work enables efficient and reliable conformer generation of cyclic structures, paving the way towards modeling structure-property relationships and the property-guided generation of rings across a wide range of applications in chemistry and biology.
Abstract:Chemical reactivity models are developed to predict chemical reaction outcomes in the form of classification (success/failure) or regression (product yield) tasks. The vast majority of the reported models are trained solely on chemical information such as reactants, products, reagents, and solvents, but not on the details of a synthetic protocol. Herein incorporation of procedural text with the aim to augment the Graphormer reactivity model and improve its accuracy is presented. Two major approaches are used: training an adapter Graphormer model that is provided with a GPT-2-derived latent representation of the text procedure (ReacLLaMA-Adapter) and labeling an unlabeled part of a dataset with the LLaMA 2 model followed by training the Graphormer on an extended dataset (Zero-Shot Labeling ReacLLaMA). Both methodologies enhance the discernment of unpromising reactions, thereby providing more accurate models with improved specificity.