Abstract:Bayesian networks are a widely-used class of probabilistic graphical models capable of representing symmetric conditional independence between variables of interest using the topology of the underlying graph. They can be seen as a special case of the much more general class of models called staged trees, which can represent any type of non-symmetric conditional independence. Here we formalize the relationship between these two models and introduce a minimal Bayesian network representation of the staged tree, which can be used to read conditional independences in an intuitive way. Furthermore, we define a new labeled graph, termed asymmetry-labeled directed acyclic graph, whose edges are labeled to denote the type of dependence existing between any two random variables. Various datasets are used to illustrate the methodology, highlighting the need to construct models which more flexibly encode and represent non-symmetric structures.
Abstract:Generative models for classification use the joint probability distribution of the class variable and the features to construct a decision rule. Among generative models, Bayesian networks and naive Bayes classifiers are the most commonly used and provide a clear graphical representation of the relationship among all variables. However, these have the disadvantage of highly restricting the type of relationships that could exist, by not allowing for context-specific independences. Here we introduce a new class of generative classifiers, called staged tree classifiers, which formally account for context-specific independence. They are constructed by a partitioning of the vertices of an event tree from which conditional independence can be formally read. The naive staged tree classifier is also defined, which extends the classic naive Bayes classifier whilst retaining the same complexity. An extensive simulation study shows that the classification accuracy of staged tree classifiers is competitive with those of state-of-the-art classifiers. An applied analysis to predict the fate of the passengers of the Titanic highlights the insights that the new class of generative classifiers can give.
Abstract:stagedtrees is an R package which includes several algorithms for learning the structure of staged trees and chain event graphs from data. Score-based and distance-based algorithms are implemented, as well as various functionalities to plot the models and perform inference. The capabilities of stagedtrees are illustrated using mainly two datasets both included in the package or bundled in R.