Abstract:The development of large databases of material properties, together with the availability of powerful computers, has allowed machine learning (ML) modeling to become a widely used tool for predicting material performances. While confidence intervals are commonly reported for such ML models, prediction intervals, i.e., the uncertainty on each prediction, are not as frequently available. Here, we investigate three easy-to-implement approaches to determine such individual uncertainty, comparing them across ten ML quantities spanning energetics, mechanical, electronic, optical, and spectral properties. Specifically, we focused on the Quantile approach, the direct machine learning of the prediction intervals and Ensemble methods.
Abstract:Uncertainty quantification in Artificial Intelligence (AI)-based predictions of material properties is of immense importance for the success and reliability of AI applications in material science. While confidence intervals are commonly reported for machine learning (ML) models, prediction intervals, i.e., the evaluation of the uncertainty on each prediction, are seldomly available. In this work we compare 3 different approaches to obtain such individual uncertainty, testing them on 12 ML-physical properties. Specifically, we investigated using the Quantile loss function, machine learning the prediction intervals directly and using Gaussian Processes. We identify each approachs advantages and disadvantages and end up slightly favoring the modeling of the individual uncertainties directly, as it is the easiest to fit and, in most cases, minimizes over-and under-estimation of the predicted errors. All data for training and testing were taken from the publicly available JARVIS-DFT database, and the codes developed for computing the prediction intervals are available through JARVIS-Tools.