Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Samanvitha Basole

Multifamily Malware Models

Jun 27, 2022

Samanvitha Basole, Fabio Di Troia, Mark Stamp

Abstract:When training a machine learning model, there is likely to be a tradeoff between accuracy and the diversity of the dataset. Previous research has shown that if we train a model to detect one specific malware family, we generally obtain stronger results as compared to a case where we train a single model on multiple diverse families. However, during the detection phase, it would be more efficient to have a single model that can reliably detect multiple families, rather than having to score each sample against multiple models. In this research, we conduct experiments based on byte $n$-gram features to quantify the relationship between the generality of the training dataset and the accuracy of the corresponding machine learning models, all within the context of the malware detection problem. We find that neighborhood-based algorithms generalize surprisingly well, far outperforming the other machine learning techniques considered.

Via

Access Paper or Ask Questions

Cluster Analysis of Malware Family Relationships

Mar 07, 2021

Samanvitha Basole, Mark Stamp

Figure 1 for Cluster Analysis of Malware Family Relationships

Figure 2 for Cluster Analysis of Malware Family Relationships

Figure 3 for Cluster Analysis of Malware Family Relationships

Figure 4 for Cluster Analysis of Malware Family Relationships

Abstract:In this paper, we use $K$-means clustering to analyze various relationships between malware samples. We consider a dataset comprising~20 malware families with~1000 samples per family. These families can be categorized into seven different types of malware. We perform clustering based on pairs of families and use the results to determine relationships between families. We perform a similar cluster analysis based on malware type. Our results indicate that $K$-means clustering can be a powerful tool for data exploration of malware family relationships.

Via

Access Paper or Ask Questions

Malware Classification with GMM-HMM Models

Mar 03, 2021

Jing Zhao, Samanvitha Basole, Mark Stamp

Figure 1 for Malware Classification with GMM-HMM Models

Figure 2 for Malware Classification with GMM-HMM Models

Figure 3 for Malware Classification with GMM-HMM Models

Figure 4 for Malware Classification with GMM-HMM Models

Abstract:Discrete hidden Markov models (HMM) are often applied to malware detection and classification problems. However, the continuous analog of discrete HMMs, that is, Gaussian mixture model-HMMs (GMM-HMM), are rarely considered in the field of cybersecurity. In this paper, we use GMM-HMMs for malware classification and we compare our results to those obtained using discrete HMMs. As features, we consider opcode sequences and entropy-based sequences. For our opcode features, GMM-HMMs produce results that are comparable to those obtained using discrete HMMs, whereas for our entropy-based features, GMM-HMMs generally improve significantly on the classification results that we have achieved with discrete HMMs.

Via

Access Paper or Ask Questions