School of Computer Science and Information Engineering, Hefei University of Technology
Abstract:Intervention intuition is often used in model explanation where the intervention effect of a feature on the outcome is quantified by the difference of a model prediction when the feature value is changed from the current value to the baseline value. Such a model intervention effect of a feature is inherently association. In this paper, we will study the conditions when an intuitive model intervention effect has a causal interpretation, i.e., when it indicates whether a feature is a direct cause of the outcome. This work links the model intervention effect to the causal interpretation of a model. Such an interpretation capability is important since it indicates whether a machine learning model is trustworthy to domain experts. The conditions also reveal the limitations of using a model intervention effect for causal interpretation in an environment with unobserved features. Experiments on semi-synthetic datasets have been conducted to validate theorems and show the potential for using the model intervention effect for model interpretation.
Abstract:Multi-label feature selection serves as an effective mean for dealing with high-dimensional multi-label data. To achieve satisfactory performance, existing methods for multi-label feature selection often require the centralization of substantial data from multiple sources. However, in Federated setting, centralizing data from all sources and merging them into a single dataset is not feasible. To tackle this issue, in this paper, we study a challenging problem of causal multi-label feature selection in federated setting and propose a Federated Causal Multi-label Feature Selection (FedCMFS) algorithm with three novel subroutines. Specifically, FedCMFS first uses the FedCFL subroutine that considers the correlations among label-label, label-feature, and feature-feature to learn the relevant features (candidate parents and children) of each class label while preserving data privacy without centralizing data. Second, FedCMFS employs the FedCFR subroutine to selectively recover the missed true relevant features. Finally, FedCMFS utilizes the FedCFC subroutine to remove false relevant features. The extensive experiments on 8 datasets have shown that FedCMFS is effect for causal multi-label feature selection in federated setting.
Abstract:An essential and challenging problem in causal inference is causal effect estimation from observational data. The problem becomes more difficult with the presence of unobserved confounding variables. The front-door adjustment is a practical approach for dealing with unobserved confounding variables. However, the restriction for the standard front-door adjustment is difficult to satisfy in practice. In this paper, we relax some of the restrictions by proposing the concept of conditional front-door (CFD) adjustment and develop the theorem that guarantees the causal effect identifiability of CFD adjustment. Furthermore, as it is often impossible for a CFD variable to be given in practice, it is desirable to learn it from data. By leveraging the ability of deep generative models, we propose CFDiVAE to learn the representation of the CFD adjustment variable directly from data with the identifiable Variational AutoEncoder and formally prove the model identifiability. Extensive experiments on synthetic datasets validate the effectiveness of CFDiVAE and its superiority over existing methods. The experiments also show that the performance of CFDiVAE is less sensitive to the causal strength of unobserved confounding variables. We further apply CFDiVAE to a real-world dataset to demonstrate its potential application.
Abstract:Causal feature selection has recently received increasing attention in machine learning. Existing causal feature selection algorithms select unique causal features of a class variable as the optimal feature subset. However, a class variable usually has multiple states, and it is unfair to select the same causal features for different states of a class variable. To address this problem, we employ the class-specific mutual information to evaluate the causal information carried by each state of the class attribute, and theoretically analyze the unique relationship between each state and the causal features. Based on this, a Fair Causal Feature Selection algorithm (FairCFS) is proposed to fairly identifies the causal features for each state of the class variable. Specifically, FairCFS uses the pairwise comparisons of class-specific mutual information and the size of class-specific mutual information values from the perspective of each state, and follows a divide-and-conquer framework to find causal features. The correctness and application condition of FairCFS are theoretically proved, and extensive experiments are conducted to demonstrate the efficiency and superiority of FairCFS compared to the state-of-the-art approaches.
Abstract:An essential problem in causal inference is estimating causal effects from observational data. The problem becomes more challenging with the presence of unobserved confounders. When there are unobserved confounders, the commonly used back-door adjustment is not applicable. Although the instrumental variable (IV) methods can deal with unobserved confounders, they all assume that the treatment directly affects the outcome, and there is no mediator between the treatment and the outcome. This paper aims to use the front-door criterion to address the challenging problem with the presence of unobserved confounders and mediators. In practice, it is often difficult to identify the set of variables used for front-door adjustment from data. By leveraging the ability of deep generative models in representation learning, we propose FDVAE to learn the representation of a Front-Door adjustment set with a Variational AutoEncoder, instead of trying to search for a set of variables for front-door adjustment. Extensive experiments on synthetic datasets validate the effectiveness of FDVAE and its superiority over existing methods. The experiments also show that the performance of FDVAE is not sensitive to the causal strength of unobserved confounders and is feasible in the case of dimensionality mismatch between learned representations and the ground truth. We further apply the method to three real-world datasets to demonstrate its potential applications.
Abstract:Causal structure learning has been extensively studied and widely used in machine learning and various applications. To achieve an ideal performance, existing causal structure learning algorithms often need to centralize a large amount of data from multiple data sources. However, in the privacy-preserving setting, it is impossible to centralize data from all sources and put them together as a single dataset. To preserve data privacy, federated learning as a new learning paradigm has attached much attention in machine learning in recent years. In this paper, we study a privacy-aware causal structure learning problem in the federated setting and propose a novel Federated PC (FedPC) algorithm with two new strategies for preserving data privacy without centralizing data. Specifically, we first propose a novel layer-wise aggregation strategy for a seamless adaptation of the PC algorithm into the federated learning paradigm for federated skeleton learning, then we design an effective strategy for learning consistent separation sets for federated edge orientation. The extensive experiments validate that FedPC is effective for causal structure learning in federated learning setting.
Abstract:This paper studies the problem of estimating the contributions of features to the prediction of a specific instance by a machine learning model and the overall contribution of a feature to the model. The causal effect of a feature (variable) on the predicted outcome reflects the contribution of the feature to a prediction very well. A challenge is that most existing causal effects cannot be estimated from data without a known causal graph. In this paper, we define an explanatory causal effect based on a hypothetical ideal experiment. The definition brings several benefits to model agnostic explanations. First, explanations are transparent and have causal meanings. Second, the explanatory causal effect estimation can be data driven. Third, the causal effects provide both a local explanation for a specific prediction and a global explanation showing the overall importance of a feature in a predictive model. We further propose a method using individual and combined variables based on explanatory causal effects for explanations. We show the definition and the method work with experiments on some real-world data sets.
Abstract:Instrumental variable (IV) is a powerful approach to inferring the causal effect of a treatment on an outcome of interest from observational data even when there exist latent confounders between the treatment and the outcome. However, existing IV methods require that an IV is selected and justified with domain knowledge. An invalid IV may lead to biased estimates. Hence, discovering a valid IV is critical to the applications of IV methods. In this paper, we study and design a data-driven algorithm to discover valid IVs from data under mild assumptions. We develop the theory based on partial ancestral graphs (PAGs) to support the search for a set of candidate Ancestral IVs (AIVs), and for each possible AIV, the identification of its conditioning set. Based on the theory, we propose a data-driven algorithm to discover a pair of IVs from data. The experiments on synthetic and real-world datasets show that the developed IV discovery algorithm estimates accurate estimates of causal effects in comparison with the state-of-the-art IV based causal effect estimators.
Abstract:Recent years have witnessed increasing interest in few-shot knowledge graph completion (FKGC), which aims to infer unseen query triples for a few-shot relation using a handful of reference triples of the relation. The primary focus of existing FKGC methods lies in learning the relation representations that can reflect the common information shared by the query and reference triples. To this end, these methods learn the embeddings of entities with their direct neighbors, and use the concatenation of the entity embeddings as the relation representations. However, the entity embeddings learned only from direct neighborhoods may have low expressiveness when the entity has sparse neighbors or shares a common local neighborhood with other entities. Moreover, the embeddings of two entities are insufficient to represent the semantic information of their relationship, especially when they have multiple relations. To address these issues, we propose a Relation-Specific Context Learning (RSCL) framework, which exploits graph contexts of triples to capture the semantic information of relations and entities simultaneously. Specifically, we first extract graph contexts for each triple, which can provide long-term entity-relation dependencies. To model the graph contexts, we then develop a hierarchical relation-specific learner to learn global and local relation-specific representations for relations by capturing contextualized information of triples and incorporating local information of entities. Finally, we utilize the learned representations to predict the likelihood of the query triples. Experimental results on two public datasets demonstrate that RSCL outperforms state-of-the-art FKGC methods.
Abstract:Local-to-global learning approach plays an essential role in Bayesian network (BN) structure learning. Existing local-to-global learning algorithms first construct the skeleton of a DAG (directed acyclic graph) by learning the MB (Markov blanket) or PC (parents and children) of each variable in a data set, then orient edges in the skeleton. However, existing MB or PC learning methods are often computationally expensive especially with a large-sized BN, resulting in inefficient local-to-global learning algorithms. To tackle the problem, in this paper, we develop an efficient local-to-global learning approach using feature selection. Specifically, we first analyze the rationale of the well-known Minimum-Redundancy and Maximum-Relevance (MRMR) feature selection approach for learning a PC set of a variable. Based on the analysis, we propose an efficient F2SL (feature selection-based structure learning) approach to local-to-global BN structure learning. The F2SL approach first employs the MRMR approach to learn a DAG skeleton, then orients edges in the skeleton. Employing independence tests or score functions for orienting edges, we instantiate the F2SL approach into two new algorithms, F2SL-c (using independence tests) and F2SL-s (using score functions). Compared to the state-of-the-art local-to-global BN learning algorithms, the experiments validated that the proposed algorithms in this paper are more efficient and provide competitive structure learning quality than the compared algorithms.