Abstract:Recent years have seen a growing concern about climate change and its impacts. While Earth System Models (ESMs) can be invaluable tools for studying the impacts of climate change, the complex coupling processes encoded in ESMs and the large amounts of data produced by these models, together with the high internal variability of the Earth system, can obscure important source-to-impact relationships. This paper presents a novel and efficient unsupervised data-driven approach for detecting statistically-significant impacts and tracing spatio-temporal source-impact pathways in the climate through a unique combination of ideas from anomaly detection, clustering and Natural Language Processing (NLP). Using as an exemplar the 1991 eruption of Mount Pinatubo in the Philippines, we demonstrate that the proposed approach is capable of detecting known post-eruption impacts/events. We additionally describe a methodology for extracting meaningful sequences of post-eruption impacts/events by using NLP to efficiently mine frequent multivariate cluster evolutions, which can be used to confirm or discover the chain of physical processes between a climate source and its impact(s).
Abstract:Disturbances to the climate system, both natural and anthropogenic, have far reaching impacts that are not always easy to identify or quantify using traditional climate science analyses or causal modeling techniques. In this paper, we develop a novel technique for discovering and ranking the chain of spatio-temporal downstream impacts of a climate source, referred to herein as a source-impact pathway, using Random Forest Regression (RFR) and SHapley Additive exPlanation (SHAP) feature importances. Rather than utilizing RFR for classification or regression tasks (the most common use case for RFR), we propose a fundamentally new RFR-based workflow in which we: (i) train random forest (RF) regressors on a set of spatio-temporal features of interest, (ii) calculate their pair-wise feature importances using the SHAP weights associated with those features, and (iii) translate these feature importances into a weighted pathway network (i.e., a weighted directed graph), which can be used to trace out and rank interdependencies between climate features and/or modalities. We adopt a tiered verification approach to verify our new pathway identification methodology. In this approach, we apply our method to ensembles of data generated by running two increasingly complex benchmarks: (i) a set of synthetic coupled equations, and (ii) a fully coupled simulation of the 1991 eruption of Mount Pinatubo in the Philippines performed using a modified version 2 of the U.S. Department of Energy's Energy Exascale Earth System Model (E3SMv2). We find that our RFR feature importance-based approach can accurately detect known pathways of impact for both test cases.