Abstract:The Greenland Ice Sheet (GrIS) has emerged as a significant contributor to global sea level rise, primarily due to increased meltwater runoff. Supraglacial lakes, which form on the ice sheet surface during the summer months, can impact ice sheet dynamics and mass loss; thus, better understanding these lakes' seasonal evolution and dynamics is an important task. This study presents a computationally efficient time series classification approach that uses Gaussian Mixture Models (GMMs) of the Reconstructed Phase Spaces (RPSs) to identify supraglacial lakes based on their seasonal evolution: 1) those that refreeze at the end of the melt season, 2) those that drain during the melt season, and 3) those that become buried, remaining liquid insulated a few meters beneath the surface. Our approach uses time series data from the Sentinel-1 and Sentinel-2 satellites, which utilize microwave and visible radiation, respectively. Evaluated on a GrIS-wide dataset, the RPS-GMM model, trained on a single representative sample per class, achieves 85.46% accuracy with Sentinel-1 data alone and 89.70% with combined Sentinel-1 and Sentinel-2 data. This performance significantly surpasses existing machine learning and deep learning models which require a large training data. The results demonstrate the robustness of the RPS-GMM model in capturing the complex temporal dynamics of supraglacial lakes with minimal training data.
Abstract:This survey paper covers the breadth and depth of time-series and spatiotemporal causality methods, and their applications in Earth Science. More specifically, the paper presents an overview of causal discovery and causal inference, explains the underlying causal assumptions, and enlists evaluation techniques and key terminologies of the domain area. The paper elicits the various state-of-the-art methods introduced for time-series and spatiotemporal causal analysis along with their strengths and limitations. The paper further describes the existing applications of several methods for answering specific Earth Science questions such as extreme weather events, sea level rise, teleconnections etc. This survey paper can serve as a primer for Data Science researchers interested in data-driven causal study as we share a list of resources, such as Earth Science datasets (synthetic, simulated and observational data) and open source tools for causal analysis. It will equally benefit the Earth Science community interested in taking an AI-driven approach to study the causality of different dynamic and thermodynamic processes as we present the open challenges and opportunities in performing causality-based Earth Science study.
Abstract:Learning causal relationships solely from observational data provides insufficient information about the underlying causal mechanism and the search space of possible causal graphs. As a result, often the search space can grow exponentially for approaches such as Greedy Equivalence Search (GES) that uses a score-based approach to search the space of equivalence classes of graphs. Prior causal information such as the presence or absence of a causal edge can be leveraged to guide the discovery process towards a more restricted and accurate search space. In this study, we present KGS, a knowledge-guided greedy score-based causal discovery approach that uses observational data and structural priors (causal edges) as constraints to learn the causal graph. KGS is a novel application of knowledge constraints that can leverage any of the following prior edge information between any two variables: the presence of a directed edge, the absence of an edge, and the presence of an undirected edge. We extensively evaluate KGS across multiple settings in both synthetic and benchmark real-world datasets. Our experimental results demonstrate that structural priors of any type and amount are helpful and guide the search process towards an improved performance and early convergence.
Abstract:Causal Discovery (CD) is the process of identifying the cause-effect relationships among the variables of a system from data. Over the years, several methods have been developed primarily based on the statistical properties of data to uncover the underlying causal mechanism. In this study, we present an extensive discussion on the methods designed to perform causal discovery from both independent and identically distributed (i.i.d.) data and time series data. For this purpose, we first introduce the common terminologies in causal discovery, and then provide a comprehensive discussion of the algorithms designed to identify the causal edges in different settings. We further discuss some of the benchmark datasets available for evaluating the performance of the causal discovery methods, available tools or software packages to perform causal discovery readily, and the common metrics used to evaluate these methods. We also test some common causal discovery algorithms on different benchmark datasets, and compare their performances. Finally, we conclude by presenting the common challenges involved in causal discovery, and also, discuss the applications of causal discovery in multiple areas of interest.
Abstract:Conventional temporal causal discovery (CD) methods suffer from high dimensionality, fail to identify lagged causal relationships, and often ignore dynamics in relations. In this study, we present a novel constraint-based CD approach for autocorrelated and non-stationary time series data (eCDANs) capable of detecting lagged and contemporaneous causal relationships along with temporal changes. eCDANs addresses high dimensionality by optimizing the conditioning sets while conducting conditional independence (CI) tests and identifies the changes in causal relations by introducing a surrogate variable to represent time dependency. Experiments on synthetic and real-world data show that eCDANs can identify time influence and outperform the baselines.
Abstract:Learning High-Resolution representations is essential for semantic segmentation. Convolutional neural network (CNN)architectures with downstream and upstream propagation flow are popular for segmentation in medical diagnosis. However, due to performing spatial downsampling and upsampling in multiple stages, information loss is inexorable. On the contrary, connecting layers densely on high spatial resolution is computationally expensive. In this work, we devise a Loose Dense Connection Strategy to connect neurons in subsequent layers with reduced parameters. On top of that, using a m-way Tree structure for feature propagation we propose Receptive Field Chain Network (RFC-Net) that learns high resolution global features on a compressed computational space. Our experiments demonstrates that RFC-Net achieves state-of-the-art performance on Kvasir and CVC-ClinicDB benchmarks for Polyp segmentation.
Abstract:This study presents a novel constraint-based causal discovery approach for autocorrelated and non-stationary time series data (CDANs). Our proposed method addresses several limitations of existing causal discovery methods for autocorrelated and non-stationary time series data, such as high dimensionality, the inability to identify lagged causal relationships, and the overlook of changing modules. Our approach identifies both lagged and instantaneous/contemporaneous causal relationships along with changing modules that vary over time. The method optimizes the conditioning sets in a constraint-based search by considering lagged parents instead of conditioning on the entire past that addresses high dimensionality. The changing modules are detected by considering both contemporaneous and lagged parents. The approach first detects the lagged adjacencies, then identifies the changing modules and contemporaneous adjacencies, and finally determines the causal direction. We extensively evaluated the proposed method using synthetic datasets and a real-world clinical dataset and compared its performance with several baseline approaches. The results demonstrate the effectiveness of the proposed method in detecting causal relationships and changing modules in autocorrelated and non-stationary time series data.
Abstract:Delirium occurs in about 80% cases in the Intensive Care Unit (ICU) and is associated with a longer hospital stay, increased mortality and other related issues. Delirium does not have any biomarker-based diagnosis and is commonly treated with antipsychotic drugs (APD). However, multiple studies have shown controversy over the efficacy or safety of APD in treating delirium. Since randomized controlled trials (RCT) are costly and time-expensive, we aim to approach the research question of the efficacy of APD in the treatment of delirium using retrospective cohort analysis. We plan to use the Causal inference framework to look for the underlying causal structure model, leveraging the availability of large observational data on ICU patients. To explore safety outcomes associated with APD, we aim to build a causal model for delirium in the ICU using large observational data sets connecting various covariates correlated with delirium. We utilized the MIMIC III database, an extensive electronic health records (EHR) dataset with 53,423 distinct hospital admissions. Our null hypothesis is: there is no significant difference in outcomes for delirium patients under different drug-group in the ICU. Through our exploratory, machine learning based and causal analysis, we had findings such as: mean length-of-stay and max length-of-stay is higher for patients in Haloperidol drug group, and haloperidol group has a higher rate of death in a year compared to other two-groups. Our generated causal model explicitly shows the functional relationships between different covariates. For future work, we plan to do time-varying analysis on the dataset.
Abstract:Structural causal models (SCMs) provide a principled approach to identifying causation from observational and experimental data in disciplines ranging from economics to medicine. SCMs, however, require domain knowledge, which is typically represented as graphical models. A key challenge in this context is the absence of a methodological framework for encoding priors (background knowledge) into causal models in a systematic manner. We propose an abstraction called causal knowledge hierarchy (CKH) for encoding priors into causal models. Our approach is based on the foundation of "levels of evidence" in medicine, with a focus on confidence in causal information. Using CKH, we present a methodological framework for encoding causal priors from various data sources and combining them to derive an SCM. We evaluate our approach on a simulated dataset and demonstrate overall performance compared to the ground truth causal model with sensitivity analysis.
Abstract:Object detection in natural scenes can be a challenging task. In many real-life situations, the visible spectrum is not suitable for traditional computer vision tasks. Moving outside the visible spectrum range, such as the thermal spectrum or the near-infrared (NIR) images, is much more beneficial in low visibility conditions, NIR images are very helpful for understanding the object's material quality. In this work, we have taken images with both the Thermal and NIR spectrum for the object detection task. As multi-spectral data with both Thermal and NIR is not available for the detection task, we needed to collect data ourselves. Data collection is a time-consuming process, and we faced many obstacles that we had to overcome. We train the YOLO v3 network from scratch to detect an object from multi-spectral images. Also, to avoid overfitting, we have done data augmentation and tune hyperparameters.