Abstract:In rank aggregation problems (RAP), the solution is usually a consensus ranking that generalizes a set of input orderings. There are different variants that differ not only in terms of the type of rankings that are used as input and output, but also in terms of the objective function employed to evaluate the quality of the desired output ranking. In contrast, in some machine learning tasks (e.g. subgroup discovery) or multimodal optimization tasks, attention is devoted to obtaining several models/results to account for the diversity in the input data or across the search landscape. Thus, in this paper we propose to provide, as the solution to an RAP, a set of rankings to better explain the preferences expressed in the input orderings. We exemplify our proposal through the Optimal Bucket Order Problem (OBOP), an RAP which consists in finding a single consensus ranking (with ties) that generalizes a set of input rankings codified as a precedence matrix. To address this, we introduce the Optimal Set of Bucket Orders Problem (OSBOP), a generalization of the OBOP that aims to produce not a single ranking as output but a set of consensus rankings. Experimental results are presented to illustrate this proposal, showing how, by providing a set of consensus rankings, the fitness of the solution significantly improves with respect to the one of the original OBOP, without losing comprehensibility.
Abstract:Federated Learning has emerged as a promising approach to train machine learning models on decentralized data sources while preserving data privacy. This paper proposes a new federated approach for Naive Bayes (NB) classification, assuming discrete variables. Our approach federates a discriminative variant of NB, sharing meaningless parameters instead of conditional probability tables. Therefore, this process is more reliable against possible attacks. We conduct extensive experiments on 12 datasets to validate the efficacy of our approach, comparing federated and non-federated settings. Additionally, we benchmark our method against the generative variant of NB, which serves as a baseline for comparison. Our experimental results demonstrate the effectiveness of our method in achieving accurate classification.
Abstract:This article presents MCTS-BN, an adaptation of the Monte Carlo Tree Search (MCTS) algorithm for the structural learning of Bayesian Networks (BNs). Initially designed for game tree exploration, MCTS has been repurposed to address the challenge of learning BN structures by exploring the search space of potential ancestral orders in Bayesian Networks. Then, it employs Hill Climbing (HC) to derive a Bayesian Network structure from each order. In large BNs, where the search space for variable orders becomes vast, using completely random orders during the rollout phase is often unreliable and impractical. We adopt a semi-randomized approach to address this challenge by incorporating variable orders obtained from other heuristic search algorithms such as Greedy Equivalent Search (GES), PC, or HC itself. This hybrid strategy mitigates the computational burden and enhances the reliability of the rollout process. Experimental evaluations demonstrate the effectiveness of MCTS-BN in improving BNs generated by traditional structural learning algorithms, exhibiting robust performance even when base algorithm orders are suboptimal and surpassing the gold standard when provided with favorable orders.
Abstract:Bayesian Network (BN) structure learning traditionally centralizes data, raising privacy concerns when data is distributed across multiple entities. This research introduces Federated GES (FedGES), a novel Federated Learning approach tailored for BN structure learning in decentralized settings using the Greedy Equivalence Search (GES) algorithm. FedGES uniquely addresses privacy and security challenges by exchanging only evolving network structures, not parameters or data. It realizes collaborative model development, using structural fusion to combine the limited models generated by each client in successive iterations. A controlled structural fusion is also proposed to enhance client consensus when adding any edge. Experimental results on various BNs from {\sf bnlearn}'s BN Repository validate the effectiveness of FedGES, particularly in high-dimensional (a large number of variables) and sparse data scenarios, offering a practical and privacy-preserving solution for real-world BN structure learning.
Abstract:We propose ODTE, a new ensemble that uses oblique decision trees as base classifiers. Additionally, we introduce STree, the base algorithm for growing oblique decision trees, which leverages support vector machines to define hyperplanes within the decision nodes. We embed a multiclass strategy -- one-vs-one or one-vs-rest -- at the decision nodes, allowing the model to directly handle non-binary classification tasks without the need to cluster instances into two groups, as is common in other approaches from the literature. In each decision node, only the best-performing model SVM -- the one that minimizes an impurity measure for the n-ary classification -- is retained, even if the learned SVM addresses a binary classification subtask. An extensive experimental study involving 49 datasets and various state-of-the-art algorithms for oblique decision tree ensembles has been conducted. Our results show that ODTE ranks consistently above its competitors, achieving significant performance gains when hyperparameters are carefully tuned. Moreover, the oblique decision trees learned through STree are more compact than those produced by other algorithms evaluated in our experiments.
Abstract:This article focuses on the supervised classification of datasets with a large number of variables and a small number of instances. This is the case, for example, for microarray data sets commonly used in bioinformatics. Complex classifiers that require estimating statistics over many variables are not suitable for this type of data. Probabilistic classifiers with low-order probability tables, e.g. NB and AODE, are good alternatives for dealing with this type of data. AODE usually improves NB in accuracy, but suffers from high spatial complexity since $k$ models, each with $n+1$ variables, are included in the AODE ensemble. In this paper, we propose MiniAnDE, an algorithm that includes only a small number of heterogeneous base classifiers in the ensemble, i.e., each model only includes a different subset of the $k$ predictive variables. Experimental evaluation shows that using MiniAnDE classifiers on microarray data is feasible and outperforms NB and other ensembles such as bagging and random forest.