Abstract:Predicting ground-state conformation from the corresponding molecular graph is crucial for many chemical applications, such as molecular modeling, molecular docking, and molecular property prediction. Recently, many learning-based methods have been proposed to replace time-consuming simulations for this task. However, these methods are often inefficient and sub-optimal as they merely rely on molecular graph information to make predictions from scratch. In this work, considering that molecular low-quality conformations are readily available, we propose a novel framework called ConfOpt to predict molecular ground-state conformation from the perspective of conformation optimization. Specifically, ConfOpt takes the molecular graph and corresponding low-quality 3D conformation as inputs, and then derives the ground-state conformation by iteratively optimizing the low-quality conformation under the guidance of the molecular graph. During training, ConfOpt concurrently optimizes the predicted atomic 3D coordinates and the corresponding interatomic distances, resulting in a strong predictive model. Extensive experiments demonstrate that ConfOpt significantly outperforms existing methods, thus providing a new paradigm for efficiently and accurately predicting molecular ground-state conformation.
Abstract:In the past few decades, polymers, high-molecular-weight compounds formed by bonding numerous identical or similar monomers covalently, have played an essential role in various scientific fields. In this context, accurate prediction of their properties is becoming increasingly crucial. Typically, the properties of a polymer, such as plasticity, conductivity, bio-compatibility, and so on, are highly correlated with its 3D structure. However, current methods for predicting polymer properties heavily rely on information from polymer SMILES sequences (P-SMILES strings) while ignoring crucial 3D structural information, leading to sub-optimal performance. In this work, we propose MMPolymer, a novel multimodal multitask pretraining framework incorporating both polymer 1D sequential information and 3D structural information to enhance downstream polymer property prediction tasks. Besides, to overcome the limited availability of polymer 3D data, we further propose the "Star Substitution" strategy to extract 3D structural information effectively. During pretraining, MMPolymer not only predicts masked tokens and recovers 3D coordinates but also achieves the cross-modal alignment of latent representation. Subsequently, we further fine-tune the pretrained MMPolymer for downstream polymer property prediction tasks in the supervised learning paradigm. Experimental results demonstrate that MMPolymer achieves state-of-the-art performance in various polymer property prediction tasks. Moreover, leveraging the pretrained MMPolymer and using only one modality (either P-SMILES string or 3D conformation) during fine-tuning can also surpass existing polymer property prediction methods, highlighting the exceptional capability of MMPolymer in polymer feature extraction and utilization. Our online platform for polymer property prediction is available at https://app.bohrium.dp.tech/mmpolymer.