Abstract:Graph-based machine learning models for materials properties show great potential to accelerate virtual high-throughput screening of large chemical spaces. However, in their simplest forms, graph-based models do not include any 3D information and are unable to distinguish stereoisomers such as those arising from different orderings of ligands around a metal center in coordination complexes. In this work we present a modification to revised autocorrelation descriptors, our molecular graph featurization method for machine learning various spin state dependent properties of octahedral transition metal complexes (TMCs). Inspired by analytical semi-empirical models for TMCs, the new modeling strategy is based on the many-body expansion (MBE) and allows one to tune the captured stereoisomer information by changing the truncation order of the MBE. We present the necessary modifications to include this approach in two commonly used machine learning methods, kernel ridge regression and feed-forward neural networks. On a test set composed of all possible isomers of binary transition metal complexes, the best MBE models achieve mean absolute errors of 2.75 kcal/mol on spin-splitting energies and 0.26 eV on frontier orbital energy gaps, a 30-40% reduction in error compared to models based on our previous approach. We also observe improved generalization to previously unseen ligands where the best-performing models exhibit mean absolute errors of 4.00 kcal/mol (i.e., a 0.73 kcal/mol reduction) on the spin-splitting energies and 0.53 eV (i.e., a 0.10 eV reduction) on the frontier orbital energy gaps. Because the new approach incorporates insights from electronic structure theory, such as ligand additivity relationships, these models exhibit systematic generalization from homoleptic to heteroleptic complexes, allowing for efficient screening of TMC search spaces.
Abstract:Transition states (TSs) are transient structures that are key in understanding reaction mechanisms and designing catalysts but challenging to be captured in experiments. Alternatively, many optimization algorithms have been developed to search for TSs computationally. Yet the cost of these algorithms driven by quantum chemistry methods (usually density functional theory) is still high, posing challenges for their applications in building large reaction networks for reaction exploration. Here we developed React-OT, an optimal transport approach for generating unique TS structures from reactants and products. React-OT generates highly accurate TS structures with a median structural root mean square deviation (RMSD) of 0.053{\AA} and median barrier height error of 1.06 kcal/mol requiring only 0.4 second per reaction. The RMSD and barrier height error is further improved by roughly 25% through pretraining React-OT on a large reaction dataset obtained with a lower level of theory, GFN2-xTB. We envision the great accuracy and fast inference of React-OT useful in targeting TSs when exploring chemical reactions with unknown mechanisms.
Abstract:Transition state (TS) search is key in chemistry for elucidating reaction mechanisms and exploring reaction networks. The search for accurate 3D TS structures, however, requires numerous computationally intensive quantum chemistry calculations due to the complexity of potential energy surfaces. Here, we developed an object-aware SE(3) equivariant diffusion model that satisfies all physical symmetries and constraints for generating sets of structures - reactant, TS, and product - in an elementary reaction. Provided reactant and product, this model generates a TS structure in seconds instead of hours required when performing quantum chemistry-based optimizations. The generated TS structures achieve a median of 0.08 {\AA} root mean square deviation compared to the true TS. With a confidence scoring model for uncertainty quantification, we approach an accuracy required for reaction rate estimation (2.6 kcal/mol) by only performing quantum chemistry-based optimizations on 14\% of the most challenging reactions. We envision the proposed approach useful in constructing large reaction networks with unknown mechanisms.
Abstract:High-throughput screening of large hypothetical databases of metal-organic frameworks (MOFs) can uncover new materials, but their stability in real-world applications is often unknown. We leverage community knowledge and machine learning (ML) models to identify MOFs that are thermally stable and stable upon activation. We separate these MOFs into their building blocks and recombine them to make a new hypothetical MOF database of over 50,000 structures that samples orders of magnitude more connectivity nets and inorganic building blocks than prior databases. This database shows an order of magnitude enrichment of ultrastable MOF structures that are stable upon activation and more than one standard deviation more thermally stable than the average experimentally characterized MOF. For the nearly 10,000 ultrastable MOFs, we compute bulk elastic moduli to confirm these materials have good mechanical stability, and we report methane deliverable capacities. Our work identifies privileged metal nodes in ultrastable MOFs that optimize gas storage and mechanical stability simultaneously.
Abstract:Photoactive iridium complexes are of broad interest due to their applications ranging from lighting to photocatalysis. However, the excited state property prediction of these complexes challenges ab initio methods such as time-dependent density functional theory (TDDFT) both from an accuracy and a computational cost perspective, complicating high throughput virtual screening (HTVS). We instead leverage low-cost machine learning (ML) models to predict the excited state properties of photoactive iridium complexes. We use experimental data of 1,380 iridium complexes to train and evaluate the ML models and identify the best-performing and most transferable models to be those trained on electronic structure features from low-cost density functional theory tight binding calculations. Using these models, we predict the three excited state properties considered, mean emission energy of phosphorescence, excited state lifetime, and emission spectral integral, with accuracy competitive with or superseding TDDFT. We conduct feature importance analysis to identify which iridium complex attributes govern excited state properties and we validate these trends with explicit examples. As a demonstration of how our ML models can be used for HTVS and the acceleration of chemical discovery, we curate a set of novel hypothetical iridium complexes and identify promising ligands for the design of new phosphors.
Abstract:Two outstanding challenges for machine learning (ML) accelerated chemical discovery are the synthesizability of candidate molecules or materials and the fidelity of the data used in ML model training. To address the first challenge, we construct a hypothetical design space of 32.5M transition metal complexes (TMCs), in which all of the constituent fragments (i.e., metals and ligands) and ligand symmetries are synthetically accessible. To address the second challenge, we search for consensus in predictions among 23 density functional approximations across multiple rungs of Jacob's ladder. To accelerate the screening of these 32.5M TMCs, we use efficient global optimization to sample candidate low-spin chromophores that simultaneously have low absorption energies and low static correlation. Despite the scarcity (i.e., $<$ 0.01\%) of potential chromophores in this large chemical space, we identify transition metal chromophores with high likelihood (i.e., $>$ 10\%) as the ML models improve during active learning. This represents a 1,000 fold acceleration in discovery corresponding to discoveries in days instead of years. Analyses of candidate chromophores reveal a preference for Co(III) and large, strong-field ligands with more bond saturation. We compute the absorption spectra of promising chromophores on the Pareto front by time-dependent density functional theory calculations and verify that two thirds of them have desired excited state properties. Although these complexes have never been experimentally explored, their constituent ligands demonstrated interesting optical properties in literature, exemplifying the effectiveness of our construction of realistic TMC design space and active learning approach.
Abstract:Approximate density functional theory (DFT) has become indispensable owing to its cost-accuracy trade-off in comparison to more computationally demanding but accurate correlated wavefunction theory. To date, however, no single density functional approximation (DFA) with universal accuracy has been identified, leading to uncertainty in the quality of data generated from DFT. With electron density fitting and transfer learning, we build a DFA recommender that selects the DFA with the lowest expected error with respect to gold standard but cost-prohibitive coupled cluster theory in a system-specific manner. We demonstrate this recommender approach on vertical spin-splitting energy evaluation for challenging transition metal complexes. Our recommender predicts top-performing DFAs and yields excellent accuracy (ca. 2 kcal/mol) for chemical discovery, outperforming both individual transfer learning models and the single best functional in a set of 48 DFAs. We demonstrate the transferability of the DFA recommender to experimentally synthesized compounds with distinct chemistry.
Abstract:Accelerated discovery with machine learning (ML) has begun to provide the advances in efficiency needed to overcome the combinatorial challenge of computational materials design. Nevertheless, ML-accelerated discovery both inherits the biases of training data derived from density functional theory (DFT) and leads to many attempted calculations that are doomed to fail. Many compelling functional materials and catalytic processes involve strained chemical bonds, open-shell radicals and diradicals, or metal-organic bonds to open-shell transition-metal centers. Although promising targets, these materials present unique challenges for electronic structure methods and combinatorial challenges for their discovery. In this Perspective, we describe the advances needed in accuracy, efficiency, and approach beyond what is typical in conventional DFT-based ML workflows. These challenges have begun to be addressed through ML models trained to predict the results of multiple methods or the differences between them, enabling quantitative sensitivity analysis. For DFT to be trusted for a given data point in a high-throughput screen, it must pass a series of tests. ML models that predict the likelihood of calculation success and detect the presence of strong correlation will enable rapid diagnoses and adaptation strategies. These "decision engines" represent the first steps toward autonomous workflows that avoid the need for expert determination of the robustness of DFT-based materials discoveries.
Abstract:Accurate virtual high-throughput screening (VHTS) of transition metal complexes (TMCs) remains challenging due to the possibility of high multi-reference (MR) character that complicates property evaluation. We compute MR diagnostics for over 5,000 ligands present in previously synthesized transition metal complexes in the Cambridge Structural Database (CSD). To accomplish this task, we introduce an iterative approach for consistent ligand charge assignment for ligands in the CSD. Across this set, we observe that MR character correlates linearly with the inverse value of the averaged bond order over all bonds in the molecule. We then demonstrate that ligand additivity of MR character holds in TMCs, which suggests that the TMC MR character can be inferred from the sum of the MR character of the ligands. Encouraged by this observation, we leverage ligand additivity and develop a ligand-derived machine learning representation to train neural networks to predict the MR character of TMCs from properties of the constituent ligands. This approach yields models with excellent performance and superior transferability to unseen ligand chemistry and compositions.
Abstract:Virtual high throughput screening (VHTS) and machine learning (ML) have greatly accelerated the design of single-site transition-metal catalysts. VHTS of catalysts, however, is often accompanied with high calculation failure rate and wasted computational resources due to the difficulty of simultaneously converging all mechanistically relevant reactive intermediates to expected geometries and electronic states. We demonstrate a dynamic classifier approach, i.e., a convolutional neural network that monitors geometry optimization on the fly, and exploit its good performance and transferability for catalyst design. We show that the dynamic classifier performs well on all reactive intermediates in the representative catalytic cycle of the radical rebound mechanism for methane-to-methanol despite being trained on only one reactive intermediate. The dynamic classifier also generalizes to chemically distinct intermediates and metal centers absent from the training data without loss of accuracy or model confidence. We rationalize this superior model transferability to the use of on-the-fly electronic structure and geometric information generated from density functional theory calculations and the convolutional layer in the dynamic classifier. Combined with model uncertainty quantification, the dynamic classifier saves more than half of the computational resources that would have been wasted on unsuccessful calculations for all reactive intermediates being considered.