Abstract:Graph neural networks (GNNs) have excelled in predictive modeling for both crystals and molecules, owing to the expressiveness of graph representations. High-entropy alloys (HEAs), however, lack chemical long-range order, limiting the applicability of current graph representations. To overcome this challenge, we propose a representation of HEAs as a collection of local environment (LE) graphs. Based on this representation, we introduce the LESets machine learning model, an accurate, interpretable GNN for HEA property prediction. We demonstrate the accuracy of LESets in modeling the mechanical properties of quaternary HEAs. Through analyses and interpretation, we further extract insights into the modeling and design of HEAs. In a broader sense, LESets extends the potential applicability of GNNs to disordered materials with combinatorial complexity formed by diverse constituents and their flexible configurations.
Abstract:Recent advances in machine learning (ML) have expedited materials discovery and design. One significant challenge faced in ML for materials is the expansive combinatorial space of potential materials formed by diverse constituents and their flexible configurations. This complexity is particularly evident in molecular mixtures, a frequently explored space for materials such as battery electrolytes. Owing to the complex structures of molecules and the sequence-independent nature of mixtures, conventional ML methods have difficulties in modeling such systems. Here we present MolSets, a specialized ML model for molecular mixtures. Representing individual molecules as graphs and their mixture as a set, MolSets leverages a graph neural network and the deep sets architecture to extract information at the molecule level and aggregate it at the mixture level, thus addressing local complexity while retaining global flexibility. We demonstrate the efficacy of MolSets in predicting the conductivity of lithium battery electrolytes and highlight its benefits in virtual screening of the combinatorial chemical space.
Abstract:Growing materials data and data-driven informatics drastically promote the discovery and design of materials. While there are significant advancements in data-driven models, the quality of data resources is less studied despite its huge impact on model performance. In this work, we focus on data bias arising from uneven coverage of materials families in existing knowledge. Observing different diversities among crystal systems in common materials databases, we propose an information entropy-based metric for measuring this bias. To mitigate the bias, we develop an entropy-targeted active learning (ET-AL) framework, which guides the acquisition of new data to improve the diversity of underrepresented crystal systems. We demonstrate the capability of ET-AL for bias mitigation and the resulting improvement in downstream machine learning models. This approach is broadly applicable to data-driven materials discovery, including autonomous data acquisition and dataset trimming to reduce bias, as well as data-driven informatics in other scientific domains.