Abstract:Cyberattacks from within an organization's trusted entities are known as insider threats. Anomaly detection using deep learning requires comprehensive data, but insider threat data is not readily available due to confidentiality concerns of organizations. Therefore, there arises demand to generate synthetic data to explore enhanced approaches for threat analysis. We propose a linear manifold learning-based generative adversarial network, SPCAGAN, that takes input from heterogeneous data sources and adds a novel loss function to train the generator to produce high-quality data that closely resembles the original data distribution. Furthermore, we introduce a deep learning-based hybrid model for insider threat analysis. We provide extensive experiments for data synthesis, anomaly detection, adversarial robustness, and synthetic data quality analysis using benchmark datasets. In this context, empirical comparisons show that GAN-based oversampling is competitive with numerous typical oversampling regimes. For synthetic data generation, our SPCAGAN model overcame the problem of mode collapse and converged faster than previous GAN models. Results demonstrate that our proposed approach has a lower error, is more accurate, and generates substantially superior synthetic insider threat data than previous models.
Abstract:Insider threats are the cyber attacks from within the trusted entities of an organization. Lack of real-world data and issue of data imbalance leave insider threat analysis an understudied research area. To mitigate the effect of skewed class distribution and prove the potential of multinomial classification algorithms for insider threat detection, we propose an approach that combines generative model with supervised learning to perform multi-class classification using deep learning. The generative adversarial network (GAN) based insider detection model introduces Conditional Generative Adversarial Network (CGAN) to enrich minority class samples to provide data for multi-class anomaly detection. The comprehensive experiments performed on the benchmark dataset demonstrates the effectiveness of introducing GAN derived synthetic data and the capability of multi-class anomaly detection in insider activity analysis. Moreover, the method is compared with other existing methods against different parameters and performance metrics.