Abstract:Despite the acknowledged capability of template-free models in exploring unseen reaction spaces compared to template-based models for retrosynthesis prediction, their ability to venture beyond established boundaries remains relatively uncharted. In this study, we empirically assess the extrapolation capability of state-of-the-art template-free models by meticulously assembling an extensive set of out-of-distribution (OOD) reactions. Our findings demonstrate that while template-free models exhibit potential in predicting precursors with novel synthesis rules, their top-10 exact-match accuracy in OOD reactions is strikingly modest (< 1%). Furthermore, despite the capability of generating novel reactions, our investigation highlights a recurring issue where more than half of the novel reactions predicted by template-free models are chemically implausible. Consequently, we advocate for the future development of template-free models that integrate considerations of chemical feasibility when navigating unexplored regions of reaction space.
Abstract:Deep learning has fostered many novel applications in materials informatics. However, the inverse design of inorganic crystals, $\textit{i.e.}$ generating new crystal structure with targeted properties, remains a grand challenge. An important ingredient for such generative models is an invertible representation that accesses the full periodic table. This is challenging due to limited data availability and the complexity of 3D periodic crystal structures. In this paper, we present a generalized invertible representation that encodes the crystallographic information into the descriptors in both real space and reciprocal space. Combining with a generative variational autoencoder (VAE), a wide range of crystallographic structures and chemistries with desired properties can be inverse-designed. We show that our VAE model predicts novel crystal structures that do not exist in the training and test database (Materials Project) with targeted formation energies and band gaps. We validate those predicted crystals by first-principles calculations. Finally, to design solids with practical applications, we address the sparse label problem by building a semi-supervised VAE and demonstrate its successful prediction of unique thermoelectric materials
Abstract:In conventional chemisorption model, the d-band center theory (augmented sometimes with the upper edge of d-band for imporved accuarcy) plays a central role in predicting adsorption energies and catalytic activity as a function of d-band center of the solid surfaces, but it requires density functional calculations that can be quite costly for large scale screening purposes of materials. In this work, we propose to use the d-band width of the muffin-tin orbital theory (to account for local coordination environment) plus electronegativity (to account for adsorbate renormalization) as a simple set of alternative descriptors for chemisorption, which do not demand the ab initio calculations. This pair of descriptors are then combined with machine learning methods, namely, artificial neural network (ANN) and kernel ridge regression (KRR), to allow large scale materials screenings. We show, for a toy set of 263 alloy systems, that the CO adsorption energy can be predicted with a remarkably small mean absolute deviation error of 0.05 eV, a significantly improved result as compared to 0.13 eV obtained with descriptors including costly d-band center calculations in literature. We achieved this high accuracy by utilizing an active learning algorithm, without which the accuracy was 0.18 eV otherwise. As a practical application of this machine, we identified Cu3Y@Cu as a highly active and cost-effective electrochemical CO2 reduction catalyst to produce CO with the overpotential 0.37 V lower than Au catalyst.