Abstract:In this study, we present a multi-class graphical Bayesian predictive classifier that incorporates the uncertainty in the model selection into the standard Bayesian formalism. For each class, the dependence structure underlying the observed features is represented by a set of decomposable Gaussian graphical models. Emphasis is then placed on the Bayesian model averaging which takes full account of the class-specific model uncertainty by averaging over the posterior graph model probabilities. An explicit evaluation of the model probabilities is well known to be infeasible. To address this issue, we consider the particle Gibbs strategy of Olsson et al. (2018b) for posterior sampling from decomposable graphical models which utilizes the Christmas tree algorithm of Olsson et al. (2018a) as proposal kernel. We also derive a strong hyper Markov law which we call the hyper normal Wishart law that allow to perform the resultant Bayesian calculations locally. The proposed predictive graphical classifier reveals superior performance compared to the ordinary Bayesian predictive rule that does not account for the model uncertainty, as well as to a number of out-of-the-box classifiers.
Abstract:We present a growing dimension asymptotic formalism. The perspective in this paper is classification theory and we show that it can accommodate probabilistic networks classifiers, including naive Bayes model and its augmented version. When represented as a Bayesian network these classifiers have an important advantage: The corresponding discriminant function turns out to be a specialized case of a generalized additive model, which makes it possible to get closed form expressions for the asymptotic misclassification probabilities used here as a measure of classification accuracy. Moreover, in this paper we propose a new quantity for assessing the discriminative power of a set of features which is then used to elaborate the augmented naive Bayes classifier. The result is a weighted form of the augmented naive Bayes that distributes weights among the sets of features according to their discriminative power. We derive the asymptotic distribution of the sample based discriminative power and show that it is seriously overestimated in a high dimensional case. We then apply this result to find the optimal, in a sense of minimum misclassification probability, type of weighting.