Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Geoff Webb

Protecting Sensitive Data through Federated Co-Training

Oct 09, 2023

Amr Abourayya, Jens Kleesiek, Kanishka Rao, Erman Ayday, Bharat Rao, Geoff Webb, Michael Kamp

Figure 1 for Protecting Sensitive Data through Federated Co-Training

Figure 2 for Protecting Sensitive Data through Federated Co-Training

Figure 3 for Protecting Sensitive Data through Federated Co-Training

Figure 4 for Protecting Sensitive Data through Federated Co-Training

Abstract:In many critical applications, sensitive data is inherently distributed. Federated learning trains a model collaboratively by aggregating the parameters of locally trained models. This avoids exposing sensitive local data. It is possible, though, to infer upon the sensitive data from the shared model parameters. At the same time, many types of machine learning models do not lend themselves to parameter aggregation, such as decision trees, or rule ensembles. It has been observed that in many applications, in particular healthcare, large unlabeled datasets are publicly available. They can be used to exchange information between clients by distributed distillation, i.e., co-regularizing local training via the discrepancy between the soft predictions of each local client on the unlabeled dataset. This, however, still discloses private information and restricts the types of models to those trainable via gradient-based methods. We propose to go one step further and use a form of federated co-training, where local hard labels on the public unlabeled datasets are shared and aggregated into a consensus label. This consensus label can be used for local training by any supervised machine learning model. We show that this federated co-training approach achieves a model quality comparable to both federated learning and distributed distillation on a set of benchmark datasets and real-world medical datasets. It improves privacy over both approaches, protecting against common membership inference attacks to the highest degree. Furthermore, we show that federated co-training can collaboratively train interpretable models, such as decision trees and rule ensembles, achieving a model quality comparable to centralized training.

Via

Access Paper or Ask Questions

Extremely Fast Decision Tree

Feb 24, 2018

Chaitanya Manapragada, Geoff Webb, Mahsa Salehi

Figure 1 for Extremely Fast Decision Tree

Figure 2 for Extremely Fast Decision Tree

Figure 3 for Extremely Fast Decision Tree

Figure 4 for Extremely Fast Decision Tree

Abstract:We introduce a novel incremental decision tree learning algorithm, Hoeffding Anytime Tree, that is statistically more efficient than the current state-of-the-art, Hoeffding Tree. We demonstrate that an implementation of Hoeffding Anytime Tree---"Extremely Fast Decision Tree", a minor modification to the MOA implementation of Hoeffding Tree---obtains significantly superior prequential accuracy on most of the largest classification datasets from the UCI repository. Hoeffding Anytime Tree produces the asymptotic batch tree in the limit, is naturally resilient to concept drift, and can be used as a higher accuracy replacement for Hoeffding Tree in most scenarios, at a small additional computational cost.

* Submitted to KDD'2018

Via

Access Paper or Ask Questions