Abstract:Identifying technological convergence among emerging technologies in cybersecurity is crucial for advancing science and fostering innovation. Unlike previous studies focusing on the binary relationship between a paper and the concept it attributes to technology, our approach utilizes attribution scores to enhance the relationships between research papers, combining keywords, citation rates, and collaboration status with specific technological concepts. The proposed method integrates text mining and bibliometric analyses to formulate and predict technological proximity indices for encryption technologies using the "OpenAlex" catalog. Our case study findings highlight a significant convergence between blockchain and public-key cryptography, evidenced by the increasing proximity indices. These results offer valuable strategic insights for those contemplating investments in these domains.
Abstract:DDoS attacks involve overwhelming a target system with a large number of requests or traffic from multiple sources, disrupting the normal traffic of a targeted server, service, or network. Distinguishing between legitimate traffic and malicious traffic is a challenging task. It is possible to classify legitimate traffic and malicious traffic and analysis the network traffic by using machine learning and deep learning techniques. However, an inter-model explanation implemented to classify a traffic flow whether is benign or malicious is an important investigation of the inner working theory of the model to increase the trustworthiness of the model. Explainable Artificial Intelligence (XAI) can explain the decision-making of the machine learning models that can be classified and identify DDoS traffic. In this context, we proposed a framework that can not only classify legitimate traffic and malicious traffic of DDoS attacks but also use SHAP to explain the decision-making of the classifier model. To address this concern, we first adopt feature selection techniques to select the top 20 important features based on feature importance techniques (e.g., XGB-based SHAP feature importance). Following that, the Multi-layer Perceptron Network (MLP) part of our proposed model uses the optimized features of the DDoS attack dataset as inputs to classify legitimate and malicious traffic. We perform extensive experiments with all features and selected features. The evaluation results show that the model performance with selected features achieves above 99\% accuracy. Finally, to provide interpretability, XAI can be adopted to explain the model performance between the prediction results and features based on global and local explanations by SHAP, which can better explain the results achieved by our proposed framework.
Abstract:Since their proposal in the 2014 paper by Ian Goodfellow, there has been an explosion of research into the area of Generative Adversarial Networks. While they have been utilised in many fields, the realm of malware research is a problem space in which GANs have taken root. From balancing datasets to creating unseen examples in rare classes, GAN models offer extensive opportunities for application. This paper surveys the current research and literature for the use of Generative Adversarial Networks in the malware problem space. This is done with the hope that the reader may be able to gain an overall understanding as to what the Generative Adversarial model provides for this field, and for what areas within malware research it is best utilised. It covers the current related surveys, the different categories of GAN, and gives the outcomes of recent research into optimising GANs for different topics, as well as future directions for exploration.
Abstract:Machine learning algorithms have been widely used in intrusion detection systems, including Multi-layer Perceptron (MLP). In this study, we proposed a two-stage model that combines the Birch clustering algorithm and MLP classifier to improve the performance of network anomaly multi-classification. In our proposed method, we first apply Birch or Kmeans as an unsupervised clustering algorithm to the CICIDS-2017 dataset to pre-group the data. The generated pseudo-label is then added as an additional feature to the training of the MLP-based classifier. The experimental results show that using Birch and K-Means clustering for data pre-grouping can improve intrusion detection system performance. Our method can achieve 99.73% accuracy in multi-classification using Birch clustering, which is better than similar researches using a stand-alone MLP model.
Abstract:Anomaly detection for indoor air quality (IAQ) data has become an important area of research as the quality of air is closely related to human health and well-being. However, traditional statistics and shallow machine learning-based approaches in anomaly detection in the IAQ area could not detect anomalies involving the observation of correlations across several data points (i.e., often referred to as long-term dependences). We propose a hybrid deep learning model that combines LSTM with Autoencoder for anomaly detection tasks in IAQ to address this issue. In our approach, the LSTM network is comprised of multiple LSTM cells that work with each other to learn the long-term dependences of the data in a time-series sequence. Autoencoder identifies the optimal threshold based on the reconstruction loss rates evaluated on every data across all time-series sequences. Our experimental results, based on the Dunedin CO2 time-series dataset obtained through a real-world deployment of the schools in New Zealand, demonstrate a very high and robust accuracy rate (99.50%) that outperforms other similar models.
Abstract:The effectiveness of machine learning models is significantly affected by the size of the dataset and the quality of features as redundant and irrelevant features can radically degrade the performance. This paper proposes IGRF-RFE: a hybrid feature selection method tasked for multi-class network anomalies using a Multilayer perceptron (MLP) network. IGRF-RFE can be considered as a feature reduction technique based on both the filter feature selection method and the wrapper feature selection method. In our proposed method, we use the filter feature selection method, which is the combination of Information Gain and Random Forest Importance, to reduce the feature subset search space. Then, we apply recursive feature elimination(RFE) as a wrapper feature selection method to further eliminate redundant features recursively on the reduced feature subsets. Our experimental results obtained based on the UNSW-NB15 dataset confirm that our proposed method can improve the accuracy of anomaly detection while reducing the feature dimension. The results show that the feature dimension is reduced from 42 to 23 while the multi-classification accuracy of MLP is improved from 82.25% to 84.24%.
Abstract:The network intrusion detection task is challenging because of the imbalanced and unlabeled nature of the dataset it operates on. Existing generative adversarial networks (GANs), are primarily used for creating synthetic samples from reals. They also have been proved successful in anomaly detection tasks. In our proposed method, we construct the trained encoder-discriminator as a one-class classifier based on Bidirectional GAN (Bi-GAN) for detecting anomalous traffic from normal traffic other than calculating expensive and complex anomaly scores or thresholds. Our experimental result illustrates that our proposed method is highly effective to be used in network intrusion detection tasks and outperforms other similar generative methods on the NSL-KDD dataset.
Abstract:The new generation of botnets leverages Artificial Intelligent (AI) techniques to conceal the identity of botmasters and the attack intention to avoid detection. Unfortunately, there has not been an existing assessment tool capable of evaluating the effectiveness of existing defense strategies against this kind of AI-based botnet attack. In this paper, we propose a sequential game theory model that is capable to analyse the details of the potential strategies botnet attackers and defenders could use to reach Nash Equilibrium (NE). The utility function is computed under the assumption when the attacker launches the maximum number of DDoS attacks with the minimum attack cost while the defender utilises the maximum number of defense strategies with the minimum defense cost. We conduct a numerical analysis based on a various number of defense strategies involved on different (simulated) cloud-band sizes in relation to different attack success rate values. Our experimental results confirm that the success of defense highly depends on the number of defense strategies used according to careful evaluation of attack rates.
Abstract:The rise of the new generation of cyber threats demands more sophisticated and intelligent cyber defense solutions equipped with autonomous agents capable of learning to make decisions without the knowledge of human experts. Several reinforcement learning methods (e.g., Markov) for automated network intrusion tasks have been proposed in recent years. In this paper, we introduce a new generation of network intrusion detection methods that combines a Q-learning-based reinforcement learning with a deep-feed forward neural network method for network intrusion detection. Our proposed Deep Q-Learning (DQL) model provides an ongoing auto-learning capability for a network environment that can detect different types of network intrusions using an automated trial-error approach and continuously enhance its detection capabilities. We provide the details of fine-tuning different hyperparameters involved in the DQL model for more effective self-learning. According to our extensive experimental results based on the NSL-KDD dataset, we confirm that the lower discount factor which is set as 0.001 under 250 episodes of training yields the best performance results. Our experimental results also show that our proposed DQL is highly effective in detecting different intrusion classes and outperforms other similar machine learning approaches.
Abstract:Network traffic data is a combination of different data bytes packets under different network protocols. These traffic packets have complex time-varying non-linear relationships. Existing state-of-the-art methods rise up to this challenge by fusing features into multiple subsets based on correlations and using hybrid classification techniques that extract spatial and temporal characteristics. This often requires high computational cost and manual support that limit them for real-time processing of network traffic. To address this, we propose a new novel feature extraction method based on covariance matrices that extract spatial-temporal characteristics of network traffic data for detecting malicious network traffic behavior. The covariance matrices in our proposed method not just naturally encode the mutual relationships between different network traffic values but also have well-defined geometry that falls in the Riemannian manifold. Riemannian manifold is embedded with distance metrics that facilitate extracting discriminative features for detecting malicious network traffic. We evaluated our model on NSL-KDD and UNSW-NB15 datasets and showed our proposed method significantly outperforms the conventional method and other existing studies on the dataset.