Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jun Wen Tang

FedEmail: Performance Measurement of Privacy-friendly Phishing Detection Enabled by Federated Learning

Jul 27, 2020

Chandra Thapa, Jun Wen Tang, Sharif Abuadbba, Yansong Gao, Yifeng Zheng, Seyit A. Camtepe, Surya Nepal, Mahathir Almashor

Figure 1 for FedEmail: Performance Measurement of Privacy-friendly Phishing Detection Enabled by Federated Learning

Figure 2 for FedEmail: Performance Measurement of Privacy-friendly Phishing Detection Enabled by Federated Learning

Figure 3 for FedEmail: Performance Measurement of Privacy-friendly Phishing Detection Enabled by Federated Learning

Figure 4 for FedEmail: Performance Measurement of Privacy-friendly Phishing Detection Enabled by Federated Learning

Abstract:Artificial intelligence (AI) has been applied in phishing email detection. Typically, it requires rich email data from a collection of sources, and the data usually contains private information that needs to be preserved. So far, AI techniques are solely focusing on centralized data training that eventually accesses sensitive raw email data from the collected data repository. Thus, a privacy-friendly AI technique such as federated learning (FL) is a desideratum. FL enables learning over distributed email datasets to protect their privacy without the requirement of accessing them during the learning in a distributed computing framework. This work, to the best of our knowledge, is the first to investigate the applicability of training email anti-phishing model via FL. Building upon the Recurrent Convolutional Neural Network for phishing email detection, we comprehensively measure and evaluate the FL-entangled learning performance under various settings, including balanced and imbalanced data distribution among clients, scalability, communication overhead, and transfer learning. Our results positively corroborate comparable performance statistics of FL in phishing email detection to centralized learning. As a trade-off to privacy and distributed learning, FL has a communication overhead of 0.179 GB per global epoch per its clients. Our measurement-based results find that FL is suitable for practical scenarios, where data size variation, including the ratio of phishing to legitimate email samples, among the clients, are present. In all these scenarios, FL shows a similar performance of testing accuracy of around 98%. Besides, we demonstrate the integration of the newly joined clients with time in FL via transfer learning to improve the client-level performance. The transfer learning-enabled training results in the improvement of the testing accuracy by up to 2.6% and fast convergence.

Via

Access Paper or Ask Questions