Abstract:The era of big data has witnessed an increasing availability of observational data from mobile and social networking, online advertising, web mining, healthcare, education, public policy, marketing campaigns, and so on, which facilitates the development of causal effect estimation. Although significant advances have been made to overcome the challenges in the academic area, such as missing counterfactual outcomes and selection bias, they only focus on source-specific and stationary observational data, which is unrealistic in most industrial applications. In this paper, we investigate a new industrial problem of causal effect estimation from incrementally available observational data and present three new evaluation criteria accordingly, including extensibility, adaptability, and accessibility. We propose a Continual Causal Effect Representation Learning method for estimating causal effects with observational data, which are incrementally available from non-stationary data distributions. Instead of having access to all seen observational data, our method only stores a limited subset of feature representations learned from previous data. Combining selective and balanced representation learning, feature representation distillation, and feature transformation, our method achieves the continual causal effect estimation for new data without compromising the estimation capability for original data. Extensive experiments demonstrate the significance of continual causal effect estimation and the effectiveness of our method.
Abstract:Graph convolutional networks (GCNs) have been proved to be very practical to handle various graph-related tasks. It has attracted considerable research interest to study deep GCNs, due to their potential superior performance compared with shallow ones. However, simply increasing network depth will, on the contrary, hurt the performance due to the over-smoothing problem. Adding residual connection is proved to be effective for learning deep convolutional neural networks (deep CNNs), it is not trivial when applied to deep GCNs. Recent works proposed an initial residual mechanism that did alleviate the over-smoothing problem in deep GCNs. However, according to our study, their algorithms are quite sensitive to different datasets. In their setting, the personalization (dynamic) and correlation (evolving) of how residual applies are ignored. To this end, we propose a novel model called Dynamic evolving initial Residual Graph Convolutional Network (DRGCN). Firstly, we use a dynamic block for each node to adaptively fetch information from the initial representation. Secondly, we use an evolving block to model the residual evolving pattern between layers. Our experimental results show that our model effectively relieves the problem of over-smoothing in deep GCNs and outperforms the state-of-the-art (SOTA) methods on various benchmark datasets. Moreover, we develop a mini-batch version of DRGCN which can be applied to large-scale data. Coupling with several fair training techniques, our model reaches new SOTA results on the large-scale ogbn-arxiv dataset of Open Graph Benchmark (OGB). Our reproducible code is available on GitHub.
Abstract:Causal inference has numerous real-world applications in many domains, such as health care, marketing, political science, and online advertising. Treatment effect estimation, a fundamental problem in causal inference, has been extensively studied in statistics for decades. However, traditional treatment effect estimation methods may not well handle large-scale and high-dimensional heterogeneous data. In recent years, an emerging research direction has attracted increasing attention in the broad artificial intelligence field, which combines the advantages of traditional treatment effect estimation approaches (e.g., propensity score, matching, and reweighing) and advanced machine learning approaches (e.g., representation learning, adversarial learning, and graph neural networks). Although the advanced machine learning approaches have shown extraordinary performance in treatment effect estimation, it also comes with a lot of new topics and new research questions. In view of the latest research efforts in the causal inference field, we provide a comprehensive discussion of challenges and opportunities for the three core components of the treatment effect estimation task, i.e., treatment, covariates, and outcome. In addition, we showcase the promising research directions of this topic from multiple perspectives.
Abstract:This paper illustrates the technologies of user next intent prediction with a concept knowledge graph. The system has been deployed on the Web at Alipay, serving more than 100 million daily active users. Specifically, we propose AlipayKG to explicitly characterize user intent, which is an offline concept knowledge graph in the Life-Service domain modeling the historical behaviors of users, the rich content interacted by users and the relations between them. We further introduce a Transformer-based model which integrates expert rules from the knowledge graph to infer the online user's next intent. Experimental results demonstrate that the proposed system can effectively enhance the performance of the downstream tasks while retaining explainability.
Abstract:Accurate estimation of post-click conversion rate is critical for building recommender systems, which has long been confronted with sample selection bias and data sparsity issues. Methods in the Entire Space Multi-task Model (ESMM) family leverage the sequential pattern of user actions, i.e. $impression\rightarrow click \rightarrow conversion$ to address data sparsity issue. However, they still fail to ensure the unbiasedness of CVR estimates. In this paper, we theoretically demonstrate that ESMM suffers from the following two problems: (1) Inherent Estimation Bias (IEB), where the estimated CVR of ESMM is inherently higher than the ground truth; (2) Potential Independence Priority (PIP) for CTCVR estimation, where there is a risk that the ESMM overlooks the causality from click to conversion. To this end, we devise a principled approach named Entire Space Counterfactual Multi-task Modelling (ESCM$^2$), which employs a counterfactual risk miminizer as a regularizer in ESMM to address both IEB and PIP issues simultaneously. Extensive experiments on offline datasets and online environments demonstrate that our proposed ESCM$^2$ can largely mitigate the inherent IEB and PIP issues and achieve better performance than baseline models.