Abstract:Cyber-security for 5G networks is drawing notable attention due to an increase in complex jamming attacks that could target the critical 5G Radio Frequency (RF) domain. These attacks pose a significant risk to heterogeneous network (HetNet) architectures, leading to degradation in network performance. Conventional machine-learning techniques for jamming detection rely on centralized training while increasing the odds of data privacy. To address these challenges, this paper proposes a decentralized two-stage federated learning (FL) framework for jamming detection in 5G femtocells. Our proposed distributed framework encompasses using the Federated Averaging (FedAVG) algorithm to train a Convolutional Autoencoder (CAE) for unsupervised learning. In the second stage, we use a fully connected network (FCN) built on the pre-trained CAE encoder that is trained using Federated Proximal (FedProx) algorithm to perform supervised classification. Our experimental results depict that our proposed framework (FedAVG and FedProx) accomplishes efficient training and prediction across non-IID client datasets without compromising data privacy. Specifically, our framework achieves a precision of 0.94, recall of 0.90, F1-score of 0.92, and an accuracy of 0.92, while minimizing communication rounds to 30 and achieving robust convergence in detecting jammed signals with an optimal client count of 6.
Abstract:Data privacy and protection through anonymization is a critical issue for network operators or data owners before it is forwarded for other possible use of data. With the adoption of Artificial Intelligence (AI), data anonymization augments the likelihood of covering up necessary sensitive information; preventing data leakage and information loss. OpenWiFi networks are vulnerable to any adversary who is trying to gain access or knowledge on traffic regardless of the knowledge possessed by data owners. The odds for discovery of actual traffic information is addressed by applied conditional tabular generative adversarial network (CTGAN). CTGAN yields synthetic data; which disguises as actual data but fostering hidden acute information of actual data. In this paper, the similarity assessment of synthetic with actual data is showcased in terms of clustering algorithms followed by a comparison of performance for unsupervised cluster validation metrics. A well-known algorithm, K-means outperforms other algorithms in terms of similarity assessment of synthetic data over real data while achieving nearest scores 0.634, 23714.57, and 0.598 as Silhouette, Calinski and Harabasz and Davies Bouldin metric respectively. On exploiting a comparative analysis in validation scores among several algorithms, K-means forms the epitome of unsupervised clustering algorithms ensuring explicit usage of synthetic data at the same time a replacement for real data. Hence, the experimental results aim to show the viability of using CTGAN-generated synthetic data in lieu of publishing anonymized data to be utilized in various applications.