Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mun Choon Chan

Markov Chain Monte Carlo-Based Machine Unlearning: Unlearning What Needs to be Forgotten

Feb 28, 2022

Quoc Phong Nguyen, Ryutaro Oikawa, Dinil Mon Divakaran, Mun Choon Chan, Bryan Kian Hsiang Low

Figure 1 for Markov Chain Monte Carlo-Based Machine Unlearning: Unlearning What Needs to be Forgotten

Figure 2 for Markov Chain Monte Carlo-Based Machine Unlearning: Unlearning What Needs to be Forgotten

Figure 3 for Markov Chain Monte Carlo-Based Machine Unlearning: Unlearning What Needs to be Forgotten

Figure 4 for Markov Chain Monte Carlo-Based Machine Unlearning: Unlearning What Needs to be Forgotten

Abstract:As the use of machine learning (ML) models is becoming increasingly popular in many real-world applications, there are practical challenges that need to be addressed for model maintenance. One such challenge is to 'undo' the effect of a specific subset of dataset used for training a model. This specific subset may contain malicious or adversarial data injected by an attacker, which affects the model performance. Another reason may be the need for a service provider to remove data pertaining to a specific user to respect the user's privacy. In both cases, the problem is to 'unlearn' a specific subset of the training data from a trained model without incurring the costly procedure of retraining the whole model from scratch. Towards this goal, this paper presents a Markov chain Monte Carlo-based machine unlearning (MCU) algorithm. MCU helps to effectively and efficiently unlearn a trained model from subsets of training dataset. Furthermore, we show that with MCU, we are able to explain the effect of a subset of a training dataset on the model prediction. Thus, MCU is useful for examining subsets of data to identify the adversarial data to be removed. Similarly, MCU can be used to erase the lineage of a user's personal data from trained ML models, thus upholding a user's "right to be forgotten". We empirically evaluate the performance of our proposed MCU algorithm on real-world phishing and diabetes datasets. Results show that MCU can achieve a desirable performance by efficiently removing the effect of a subset of training dataset and outperform an existing algorithm that utilizes the remaining dataset.

* Proceedings of the 2022 ACM Asia Conference on Computer and Communications Security (ASIA CCS '22), May 30-June 3, 2022, Nagasaki, Japan

Via

Access Paper or Ask Questions

Collaborative Machine Learning with Incentive-Aware Model Rewards

Oct 24, 2020

Rachael Hwee Ling Sim, Yehong Zhang, Mun Choon Chan, Bryan Kian Hsiang Low

Figure 1 for Collaborative Machine Learning with Incentive-Aware Model Rewards

Figure 2 for Collaborative Machine Learning with Incentive-Aware Model Rewards

Figure 3 for Collaborative Machine Learning with Incentive-Aware Model Rewards

Figure 4 for Collaborative Machine Learning with Incentive-Aware Model Rewards

Abstract:Collaborative machine learning (ML) is an appealing paradigm to build high-quality ML models by training on the aggregated data from many parties. However, these parties are only willing to share their data when given enough incentives, such as a guaranteed fair reward based on their contributions. This motivates the need for measuring a party's contribution and designing an incentive-aware reward scheme accordingly. This paper proposes to value a party's reward based on Shapley value and information gain on model parameters given its data. Subsequently, we give each party a model as a reward. To formally incentivize the collaboration, we define some desirable properties (e.g., fairness and stability) which are inspired by cooperative game theory but adapted for our model reward that is uniquely freely replicable. Then, we propose a novel model reward scheme to satisfy fairness and trade off between the desirable properties via an adjustable parameter. The value of each party's model reward determined by our scheme is attained by injecting Gaussian noise to the aggregated training data with an optimized noise variance. We empirically demonstrate interesting properties of our scheme and evaluate its performance using synthetic and real-world datasets.

* 37th International Conference on Machine Learning (ICML 2020), Extended version with proofs and additional experimental results, 17 pages

Via

Access Paper or Ask Questions

GEE: A Gradient-based Explainable Variational Autoencoder for Network Anomaly Detection

Mar 15, 2019

Quoc Phong Nguyen, Kar Wai Lim, Dinil Mon Divakaran, Kian Hsiang Low, Mun Choon Chan

Figure 1 for GEE: A Gradient-based Explainable Variational Autoencoder for Network Anomaly Detection

Figure 2 for GEE: A Gradient-based Explainable Variational Autoencoder for Network Anomaly Detection

Figure 3 for GEE: A Gradient-based Explainable Variational Autoencoder for Network Anomaly Detection

Figure 4 for GEE: A Gradient-based Explainable Variational Autoencoder for Network Anomaly Detection

Abstract:This paper looks into the problem of detecting network anomalies by analyzing NetFlow records. While many previous works have used statistical models and machine learning techniques in a supervised way, such solutions have the limitations that they require large amount of labeled data for training and are unlikely to detect zero-day attacks. Existing anomaly detection solutions also do not provide an easy way to explain or identify attacks in the anomalous traffic. To address these limitations, we develop and present GEE, a framework for detecting and explaining anomalies in network traffic. GEE comprises of two components: (i) Variational Autoencoder (VAE) - an unsupervised deep-learning technique for detecting anomalies, and (ii) a gradient-based fingerprinting technique for explaining anomalies. Evaluation of GEE on the recent UGR dataset demonstrates that our approach is effective in detecting different anomalies as well as identifying fingerprints that are good representations of these various attacks.

* to appear in 2019 IEEE Conference on Communications and Network Security (CNS)

Via

Access Paper or Ask Questions