Abstract:A major limitation of machine learning (ML) prediction models is that they recover associational, rather than causal, predictive relationships between variables. In high-stakes automation applications of ML this is problematic, as the model often learns spurious, non-causal associations. This paper proposes mechanism learning, a simple method which uses front-door causal bootstrapping to deconfound observational data such that any appropriate ML model is forced to learn predictive relationships between effects and their causes (reverse causal inference), despite the potential presence of multiple unknown and unmeasured confounding. Effect variables can be very high dimensional, and the predictive relationship nonlinear, as is common in ML applications. This novel method is widely applicable, the only requirement is the existence of a mechanism variable mediating the cause (prediction target) and effect (feature data), which is independent of the (unmeasured) confounding variables. We test our method on fully synthetic, semi-synthetic and real-world datasets, demonstrating that it can discover reliable, unbiased, causal ML predictors where by contrast, the same ML predictor trained naively using classical supervised learning on the original observational data, is heavily biased by spurious associations. We provide code to implement the results in the paper, online.
Abstract:The $K$-medoids problem is a challenging combinatorial clustering task, widely used in data analysis applications. While numerous algorithms have been proposed to solve this problem, none of these are able to obtain an exact (globally optimal) solution for the problem in polynomial time. In this paper, we present EKM: a novel algorithm for solving this problem exactly with worst-case $O\left(N^{K+1}\right)$ time complexity. EKM is developed according to recent advances in transformational programming and combinatorial generation, using formal program derivation steps. The derived algorithm is provably correct by construction. We demonstrate the effectiveness of our algorithm by comparing it against various approximate methods on numerous real-world datasets. We show that the wall-clock run time of our algorithm matches the worst-case time complexity analysis on synthetic datasets, clearly outperforming the exponential time complexity of benchmark branch-and-bound based MIP solvers. To our knowledge, this is the first, rigorously-proven polynomial time, practical algorithm for this ubiquitous problem.
Abstract:Causal identification in causal Bayes nets (CBNs) is an important tool in causal inference allowing the derivation of interventional distributions from observational distributions where this is possible in principle. However, most existing formulations of causal identification using techniques such as d-separation and do-calculus are expressed within the mathematical language of classical probability theory on CBNs. However, there are many causal settings where probability theory and hence current causal identification techniques are inapplicable such as relational databases, dataflow programs such as hardware description languages, distributed systems and most modern machine learning algorithms. We show that this restriction can be lifted by replacing the use of classical probability theory with the alternative axiomatic foundation of symmetric monoidal categories. In this alternative axiomatization, we show how an unambiguous and clean distinction can be drawn between the general syntax of causal models and any specific semantic implementation of that causal model. This allows a purely syntactic algorithmic description of general causal identification by a translation of recent formulations of the general ID algorithm through fixing. Our description is given entirely in terms of the non-parametric ADMG structure specifying a causal model and the algebraic signature of the corresponding monoidal category, to which a sequence of manipulations is then applied so as to arrive at a modified monoidal category in which the desired, purely syntactic interventional causal model, is obtained. We use this idea to derive purely syntactic analogues of classical back-door and front-door causal adjustment, and illustrate an application to a more complex causal model.
Abstract:Algorithms for solving the linear classification problem have a long history, dating back at least to 1936 with linear discriminant analysis. For linearly separable data, many algorithms can obtain the exact solution to the corresponding 0-1 loss classification problem efficiently, but for data which is not linearly separable, it has been shown that this problem, in full generality, is NP-hard. Alternative approaches all involve approximations of some kind, including the use of surrogates for the 0-1 loss (for example, the hinge or logistic loss) or approximate combinatorial search, none of which can be guaranteed to solve the problem exactly. Finding efficient algorithms to obtain an exact i.e. globally optimal solution for the 0-1 loss linear classification problem with fixed dimension, remains an open problem. In research we report here, we detail the construction of a new algorithm, incremental cell enumeration (ICE), that can solve the 0-1 loss classification problem exactly in polynomial time. To our knowledge, this is the first, rigorously-proven polynomial time algorithm for this long-standing problem.
Abstract:Dysphonia is one of the early symptoms of Parkinson's disease (PD). Most existing methods use feature selection methods to find the optimal subset of voice features for all PD patients. Few have considered the heterogeneity between patients, which implies the need to provide specific prediction models for different patients. However, building the specific model faces the challenge of small sample size, which makes it lack generalization ability. Instance transfer is an effective way to solve this problem. Therefore, this paper proposes a patient-specific game-based transfer (PSGT) method for PD severity prediction. First, a selection mechanism is used to select PD patients with similar disease trends to the target patient from the source domain, which greatly reduces the risk of negative transfer. Then, the contribution of the transferred subjects and their instances to the disease estimation of the target subject is fairly evaluated by the Shapley value, which improves the interpretability of the method. Next, the proportion of valid instances in the transferred subjects is determined, and the instances with higher contribution are transferred to further reduce the difference between the transferred instance subset and the target subject. Finally, the selected subset of instances is added to the training set of the target subject, and the extended data is fed into the random forest to improve the performance of the method. Parkinson's telemonitoring dataset is used to evaluate the feasibility and effectiveness. Experiment results show that the PSGT has better performance in both prediction error and stability over compared methods.
Abstract:Dynamic programming (DP) is a broadly applicable algorithmic design paradigm for the efficient, exact solution of otherwise intractable, combinatorial problems. However, the design of such algorithms is often presented informally in an ad-hoc manner, and as a result is often difficult to apply correctly. In this paper, we present a rigorous algebraic formalism for systematically deriving novel DP algorithms, either from existing DP algorithms or from simple functional recurrences. These derivations lead to algorithms which are provably correct and polymorphic over any semiring, which means that they can be applied to the full scope of combinatorial problems expressible in terms of semirings. This includes, for example: optimization, optimal probability and Viterbi decoding, probabilistic marginalization, logical inference, fuzzy sets, differentiable softmax, and relational and provenance queries. The approach, building on many ideas from the existing literature on constructive algorithmics, exploits generic properties of (semiring) polymorphic functions, tupling and formal sums (lifting), and algebraic simplifications arising from constraint algebras. We demonstrate the effectiveness of this formalism for some example applications arising in signal processing, bioinformatics and reliability engineering.
Abstract:We propose a robust framework for interpretable, few-shot analysis of non-stationary sequential data based on flexible graphical models to express the structured distribution of sequential events, using prototype radial basis function (RBF) neural network emissions. A motivational link is demonstrated between prototypical neural network architectures for few-shot learning and the proposed RBF network infinite hidden Markov model (RBF-iHMM). We show that RBF networks can be efficiently specified via prototypes allowing us to express complex nonstationary patterns, while hidden Markov models are used to infer principled high-level Markov dynamics. The utility of the framework is demonstrated on biomedical signal processing applications such as automated seizure detection from EEG data where RBF networks achieve state-of-the-art performance using a fraction of the data needed to train long-short-term memory variational autoencoders.
Abstract:Latent feature models are canonical tools for exploratory analysis in classical and modern multivariate statistics. Many high-dimensional data can be approximated using a union of low-dimensional subspaces or factors. The allocation of data points to these latent factors itself typically uncovers key relationships in the input and helps us represent hidden causes explaining the data. A widely adopted view is to model feature allocation with discrete latent variables, where each data point is associated with a binary vector indicating latent features possessed by this data point. In this work we revise some of the issues with existing parametric and Bayesian nonparametric processes for feature allocation modelling and propose a novel framework that can capture wider set of feature allocation distributions. This new framework allows for explicit control over the number of features used to express each point and enables a more flexible set of allocation distributions including feature allocations with different sparsity levels. We use this approach to derive a novel adaptive Factor analysis (aFA), as well as, an adaptive probabilistic principle component analysis (aPPCA) capable of flexible structure discovery and dimensionality reduction in a wide case of scenarios. We derive both standard a Gibbs sampler, as well as, an expectation-maximization inference algorithms for aPPCA and aFA that converge orders of magnitude faster to a point estimate. We demonstrate that aFA can handle richer feature distributions, when compared to widely used sparse FA models and nonparametric FA models. We show that aPPCA and aFA can infer interpretable high level features both when applied on raw MNIST, when applied for interpreting autoencoder features. We also demonstrate an application of the aPPCA to more robust blind source separation for functional magnetic resonance imaging (fMRI).
Abstract:To draw scientifically meaningful conclusions and build reliable models of quantitative phenomena, cause and effect must be taken into consideration (either implicitly or explicitly). This is particularly challenging when the measurements are not from controlled experimental (interventional) settings, since cause and effect can be obscured by spurious, indirect influences. Modern predictive techniques from machine learning are capable of capturing high-dimensional, nonlinear relationships between variables while relying on few parametric or probabilistic model assumptions. However, since these techniques are associational, applied to observational data they are prone to picking up spurious influences from non-experimental (observational) data, making their predictions unreliable. Techniques from causal inference, such as probabilistic causal diagrams and do-calculus, provide powerful (nonparametric) tools for drawing causal inferences from such observational data. However, these techniques are often incompatible with modern, nonparametric machine learning algorithms since they typically require explicit probabilistic models. Here, we develop causal bootstrapping for augmenting classical nonparametric bootstrap resampling with information on the causal relationship between variables. This makes it possible to resample observational data such that, if it is possible to identify an interventional relationship from that data, new data representing that relationship can be simulated from the original observational data. In this way, we can use modern machine learning algorithms unaltered to make statistically powerful, yet causally-robust, predictions. We develop several causal bootstrapping algorithms for drawing interventional inferences from observational data, for classification and regression problems, and demonstrate, using synthetic and real-world examples, the value of this approach.
Abstract:Using the linear Gaussian latent variable model as a starting point we relax some of the constraints it imposes by deriving a nonparametric latent feature Gaussian variable model. This model introduces additional discrete latent variables to the original structure. The Bayesian nonparametric nature of this new model allows it to adapt complexity as more data is observed and project each data point onto a varying number of subspaces. The linear relationship between the continuous latent and observed variables make the proposed model straightforward to interpret, resembling a locally adaptive probabilistic PCA (A-PPCA). We propose two alternative Gibbs sampling procedures for inference in the new model and demonstrate its applicability on sensor data for passive health monitoring.