Abstract:Machine learning (ML) models have shown success in applications with an objective of prediction, but the algorithmic complexity of some models makes them difficult to interpret. Methods have been proposed to provide insight into these "black-box" models, but there is little research that focuses on supervised ML when the model inputs are functional data. In this work, we consider two applications from high-consequence spaces with objectives of making predictions using functional data inputs. One application aims to classify material types to identify explosive materials given hyperspectral computed tomography scans of the materials. The other application considers the forensics science task of connecting an inkjet printed document to the source printer using color signatures extracted by Raman spectroscopy. An instinctive route to consider for analyzing these data is a data driven ML model for classification, but due to the high consequence nature of the applications, we argue it is important to appropriately account for the nature of the data in the analysis to not obscure or misrepresent patterns. As such, we propose the Variable importance Explainable Elastic Shape Analysis (VEESA) pipeline for training ML models with functional data that (1) accounts for the vertical and horizontal variability in the functional data and (2) provides an explanation in the original data space of how the model uses variability in the functional data for prediction. The pipeline makes use of elastic functional principal components analysis (efPCA) to generate uncorrelated model inputs and permutation feature importance (PFI) to identify the principal components important for prediction. The variability captured by the important principal components in visualized the original data space. We ultimately discuss ideas for natural extensions of the VEESA pipeline and challenges for future research.
Abstract:The 2022 National Defense Strategy of the United States listed climate change as a serious threat to national security. Climate intervention methods, such as stratospheric aerosol injection, have been proposed as mitigation strategies, but the downstream effects of such actions on a complex climate system are not well understood. The development of algorithmic techniques for quantifying relationships between source and impact variables related to a climate event (i.e., a climate pathway) would help inform policy decisions. Data-driven deep learning models have become powerful tools for modeling highly nonlinear relationships and may provide a route to characterize climate variable relationships. In this paper, we explore the use of an echo state network (ESN) for characterizing climate pathways. ESNs are a computationally efficient neural network variation designed for temporal data, and recent work proposes ESNs as a useful tool for forecasting spatio-temporal climate data. Like other neural networks, ESNs are non-interpretable black-box models, which poses a hurdle for understanding variable relationships. We address this issue by developing feature importance methods for ESNs in the context of spatio-temporal data to quantify variable relationships captured by the model. We conduct a simulation study to assess and compare the feature importance techniques, and we demonstrate the approach on reanalysis climate data. In the climate application, we select a time period that includes the 1991 volcanic eruption of Mount Pinatubo. This event was a significant stratospheric aerosol injection, which we use as a proxy for an artificial stratospheric aerosol injection. Using the proposed approach, we are able to characterize relationships between pathway variables associated with this event.
Abstract:Optical spectral-temporal signatures extracted from videos of explosions provide information for identifying characteristics of the corresponding explosive devices. Currently, the identification is done using heuristic algorithms and direct subject matter expert review. An improvement in predictive performance may be obtained by using machine learning, but this application lends itself to high consequence national security decisions, so it is not only important to provide high accuracy but clear explanations for the predictions to garner confidence in the model. While much work has been done to develop explainability methods for machine learning models, not much of the work focuses on situations with input variables of the form of functional data such optical spectral-temporal signatures. We propose a procedure for explaining machine learning models fit using functional data that accounts for the functional nature the data. Our approach makes use of functional principal component analysis (fPCA) and permutation feature importance (PFI). fPCA is used to transform the functions to create uncorrelated functional principal components (fPCs). The model is trained using the fPCs as inputs, and PFI is applied to identify the fPCs important to the model for prediction. Visualizations are used to interpret the variability explained by the fPCs that are found to be important by PFI to determine the aspects of the functions that are important for prediction. We demonstrate the technique by explaining neural networks fit to explosion optical spectral-temporal signatures for predicting characteristics of the explosive devices.