Abstract:Causal variables in Markov boundary (MB) have been widely applied in extensive single-label tasks. While few researches focus on the causal variable discovery in multi-label data due to the complex causal relationships. Since some variables in multi-label scenario might contain causal information about multiple labels, this paper investigates the problem of multi-label causal variable discovery as well as the distinguishing between common causal variables shared by multiple labels and label-specific causal variables associated with some single labels. Considering the multiple MBs under the non-positive joint probability distribution, we explore the relationships between common causal variables and equivalent information phenomenon, and find that the solutions are influenced by equivalent information following different mechanisms with or without existence of label causality. Analyzing these mechanisms, we provide the theoretical property of common causal variables, based on which the discovery and distinguishing algorithm is designed to identify these two types of variables. Similar to single-label problem, causal variables for multiple labels also have extensive application prospects. To demonstrate this, we apply the proposed causal mechanism to multi-label feature selection and present an interpretable algorithm, which is proved to achieve the minimal redundancy and the maximum relevance. Extensive experiments demonstrate the efficacy of these contributions.
Abstract:The probabilistic classification vector machine (PCVM) synthesizes the advantages of both the support vector machine and the relevant vector machine, delivering a sparse Bayesian solution to classification problems. However, the PCVM is currently only applicable to binary cases. Extending the PCVM to multi-class cases via heuristic voting strategies such as one-vs-rest or one-vs-one often results in a dilemma where classifiers make contradictory predictions, and those strategies might lose the benefits of probabilistic outputs. To overcome this problem, we extend the PCVM and propose a multi-class probabilistic classification vector machine (mPCVM). Two learning algorithms, i.e., one top-down algorithm and one bottom-up algorithm, have been implemented in the mPCVM. The top-down algorithm obtains the maximum a posteriori (MAP) point estimates of the parameters based on an expectation-maximization algorithm, and the bottom-up algorithm is an incremental paradigm by maximizing the marginal likelihood. The superior performance of the mPCVMs, especially when the investigated problem has a large number of classes, is extensively evaluated on synthetic and benchmark data sets.
Abstract:Sparse Bayesian learning is a state-of-the-art supervised learning algorithm that can choose a subset of relevant samples from the input data and make reliable probabilistic predictions. However, in the presence of high-dimensional data with irrelevant features, traditional sparse Bayesian classifiers suffer from performance degradation and low efficiency by failing to eliminate irrelevant features. To tackle this problem, we propose a novel sparse Bayesian embedded feature selection method that adopts truncated Gaussian distributions as both sample and feature priors. The proposed method, called probabilistic feature selection and classification vector machine (PFCVMLP ), is able to simultaneously select relevant features and samples for classification tasks. In order to derive the analytical solutions, Laplace approximation is applied to compute approximate posteriors and marginal likelihoods. Finally, parameters and hyperparameters are optimized by the type-II maximum likelihood method. Experiments on three datasets validate the performance of PFCVMLP along two dimensions: classification performance and effectiveness for feature selection. Finally, we analyze the generalization performance and derive a generalization error bound for PFCVMLP . By tightening the bound, the importance of feature selection is demonstrated.