Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ritik Mehta

Malware Classification using a Hybrid Hidden Markov Model-Convolutional Neural Network

Dec 25, 2024

Ritik Mehta, Olha Jureckova, Mark Stamp

Figure 1 for Malware Classification using a Hybrid Hidden Markov Model-Convolutional Neural Network

Figure 2 for Malware Classification using a Hybrid Hidden Markov Model-Convolutional Neural Network

Figure 3 for Malware Classification using a Hybrid Hidden Markov Model-Convolutional Neural Network

Figure 4 for Malware Classification using a Hybrid Hidden Markov Model-Convolutional Neural Network

Abstract:The proliferation of malware variants poses a significant challenges to traditional malware detection approaches, such as signature-based methods, necessitating the development of advanced machine learning techniques. In this research, we present a novel approach based on a hybrid architecture combining features extracted using a Hidden Markov Model (HMM), with a Convolutional Neural Network (CNN) then used for malware classification. Inspired by the strong results in previous work using an HMM-Random Forest model, we propose integrating HMMs, which serve to capture sequential patterns in opcode sequences, with CNNs, which are adept at extracting hierarchical features. We demonstrate the effectiveness of our approach on the popular Malicia dataset, and we obtain superior performance, as compared to other machine learning methods -- our results surpass the aforementioned HMM-Random Forest model. Our findings underscore the potential of hybrid HMM-CNN architectures in bolstering malware classification capabilities, offering several promising avenues for further research in the field of cybersecurity.

* arXiv admin note: substantial text overlap with arXiv:2307.11032

Via

Access Paper or Ask Questions

A Natural Language Processing Approach to Malware Classification

Jul 07, 2023

Ritik Mehta, Olha Jurečková, Mark Stamp

Abstract:Many different machine learning and deep learning techniques have been successfully employed for malware detection and classification. Examples of popular learning techniques in the malware domain include Hidden Markov Models (HMM), Random Forests (RF), Convolutional Neural Networks (CNN), Support Vector Machines (SVM), and Recurrent Neural Networks (RNN) such as Long Short-Term Memory (LSTM) networks. In this research, we consider a hybrid architecture, where HMMs are trained on opcode sequences, and the resulting hidden states of these trained HMMs are used as feature vectors in various classifiers. In this context, extracting the HMM hidden state sequences can be viewed as a form of feature engineering that is somewhat analogous to techniques that are commonly employed in Natural Language Processing (NLP). We find that this NLP-based approach outperforms other popular techniques on a challenging malware dataset, with an HMM-Random Forrest model yielding the best results.

Via

Access Paper or Ask Questions