Abstract:We propose a data-driven method to extract dissimilarity between materials, with respect to a given target physical property. The technique is based on an ensemble method with Kernel ridge regression as the predicting model; multiple random subset sampling of the materials is done to generate prediction models and the corresponding contributions of the reference training materials in detail. The distribution of the predicted values for each material can be approximated by a Gaussian mixture model. The reference training materials contributed to the prediction model that accurately predicts the physical property value of a specific material, are considered to be similar to that material, or vice versa. Evaluations using synthesized data demonstrate that the proposed method can effectively measure the dissimilarity between data instances. An application of the analysis method on the data of Curie temperature (TC) of binary 3d transition metal 4f rare earth binary alloys also reveals meaningful results on the relations between the materials. The proposed method can be considered as a potential tool for obtaining a deeper understanding of the structure of data, with respect to a target property, in particular.
Abstract:In this study, we establish a basis for selecting similarity measures when applying machine learning techniques to solve materials science problems. This selection is considered with an emphasis on the distinctiveness between materials that reflect their nature well. We perform a case study with a dataset of rare-earth transition metal crystalline compounds represented using the Orbital Field Matrix descriptor and the Coulomb Matrix descriptor. We perform predictions of the formation energies using k-nearest neighbors regression, ridge regression, and kernel ridge regression. Through detailed analyses of the yield prediction accuracy, we examine the relationship between the characteristics of the material representation and similarity measures, and the complexity of the energy function they can capture. Empirical experiments and theoretical analysis reveal that similarity measures and kernels that minimize the loss of materials distinctiveness improve the prediction performance.