Abstract:Quantifying prediction uncertainty when applying object detection models to new, unlabeled datasets is critical in applied machine learning. This study introduces an approach to estimate the performance of deep learning-based object detection models for quantifying defects in transmission electron microscopy (TEM) images, focusing on detecting irradiation-induced cavities in TEM images of metal alloys. We developed a random forest regression model that predicts the object detection F1 score, a statistical metric used to evaluate the ability to accurately locate and classify objects of interest. The random forest model uses features extracted from the predictions of the object detection model whose uncertainty is being quantified, enabling fast prediction on new, unlabeled images. The mean absolute error (MAE) for predicting F1 of the trained model on test data is 0.09, and the $R^2$ score is 0.77, indicating there is a significant correlation between the random forest regression model predicted and true defect detection F1 scores. The approach is shown to be robust across three distinct TEM image datasets with varying imaging and material domains. Our approach enables users to estimate the reliability of a defect detection and segmentation model predictions and assess the applicability of the model to their specific datasets, providing valuable information about possible domain shifts and whether the model needs to be fine-tuned or trained on additional data to be maximally effective for the desired use case.
Abstract:Ensemble models can be used to estimate prediction uncertainties in machine learning models. However, an ensemble of N models is approximately N times more computationally demanding compared to a single model when it is used for inference. In this work, we explore fitting a single model to predicted ensemble error bar data, which allows us to estimate uncertainties without the need for a full ensemble. Our approach is based on three models: Model A for predictive accuracy, Model $A_{E}$ for traditional ensemble-based error bar prediction, and Model B, fit to data from Model $A_{E}$, to be used for predicting the values of $A_{E}$ but with only one model evaluation. Model B leverages synthetic data augmentation to estimate error bars efficiently. This approach offers a highly flexible method of uncertainty quantification that can approximate that of ensemble methods but only requires a single extra model evaluation over Model A during inference. We assess this approach on a set of problems in materials science.