Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Martin Kopp

Command-line Obfuscation Detection using Small Language Models

Aug 05, 2024

Vojtech Outrata, Michael Adam Polak, Martin Kopp

Abstract:To avoid detection, adversaries often use command-line obfuscation. There are numerous techniques of the command-line obfuscation, all designed to alter the command-line syntax without affecting its original functionality. This variability forces most security solutions to create an exhaustive enumeration of signatures for even a single pattern. In contrast to using signatures, we have implemented a scalable NLP-based detection method that leverages a custom-trained, small transformer language model that can be applied to any source of execution logs. The evaluation on top of real-world telemetry demonstrates that our approach yields high-precision detections even on high-volume telemetry from a diverse set of environments spanning from universities and businesses to healthcare or finance. The practical value is demonstrated in a case study of real-world samples detected by our model. We show the model's superiority to signatures on established malware known to employ obfuscation and showcase previously unseen obfuscated samples detected by our model.

Via

Access Paper or Ask Questions

A framework for comprehensible multi-modal detection of cyber threats

Nov 10, 2021

Jan Kohout, Čeněk Škarda, Kyrylo Shcherbin, Martin Kopp, Jan Brabec

Figure 1 for A framework for comprehensible multi-modal detection of cyber threats

Figure 2 for A framework for comprehensible multi-modal detection of cyber threats

Figure 3 for A framework for comprehensible multi-modal detection of cyber threats

Abstract:Detection of malicious activities in corporate environments is a very complex task and much effort has been invested into research of its automation. However, vast majority of existing methods operate only in a narrow scope which limits them to capture only fragments of the evidence of malware's presence. Consequently, such approach is not aligned with the way how the cyber threats are studied and described by domain experts. In this work, we discuss these limitations and design a detection framework which combines observed events from different sources of data. Thanks to this, it provides full insight into the attack life cycle and enables detection of threats that require this coupling of observations from different telemetries to identify the full scope of the incident. We demonstrate applicability of the framework on a case study of a real malware infection observed in a corporate network.

Via

Access Paper or Ask Questions

Cluster Representatives Selection in Non-Metric Spaces for Nearest Prototype Classification

Jul 03, 2021

Jaroslav Hlaváč, Martin Kopp, Jan Kohout

Figure 1 for Cluster Representatives Selection in Non-Metric Spaces for Nearest Prototype Classification

Figure 2 for Cluster Representatives Selection in Non-Metric Spaces for Nearest Prototype Classification

Figure 3 for Cluster Representatives Selection in Non-Metric Spaces for Nearest Prototype Classification

Figure 4 for Cluster Representatives Selection in Non-Metric Spaces for Nearest Prototype Classification

Abstract:The nearest prototype classification is a less computationally intensive replacement for the $k$-NN method, especially when large datasets are considered. In metric spaces, centroids are often used as prototypes to represent whole clusters. The selection of cluster prototypes in non-metric spaces is more challenging as the idea of computing centroids is not directly applicable. In this paper, we present CRS, a novel method for selecting a small yet representative subset of objects as a cluster prototype. Memory and computationally efficient selection of representatives is enabled by leveraging the similarity graph representation of each cluster created by the NN-Descent algorithm. CRS can be used in an arbitrary metric or non-metric space because of the graph-based approach, which requires only a pairwise similarity measure. As we demonstrate in the experimental evaluation, our method outperforms the state of the art techniques on multiple datasets from different domains.

Via

Access Paper or Ask Questions

Learning Explainable Representations of Malware Behavior

Jun 23, 2021

Paul Prasse, Jan Brabec, Jan Kohout, Martin Kopp, Lukas Bajer, Tobias Scheffer

Figure 1 for Learning Explainable Representations of Malware Behavior

Figure 2 for Learning Explainable Representations of Malware Behavior

Figure 3 for Learning Explainable Representations of Malware Behavior

Figure 4 for Learning Explainable Representations of Malware Behavior

Abstract:We address the problems of identifying malware in network telemetry logs and providing \emph{indicators of compromise} -- comprehensible explanations of behavioral patterns that identify the threat. In our system, an array of specialized detectors abstracts network-flow data into comprehensible \emph{network events} in a first step. We develop a neural network that processes this sequence of events and identifies specific threats, malware families and broad categories of malware. We then use the \emph{integrated-gradients} method to highlight events that jointly constitute the characteristic behavioral pattern of the threat. We compare network architectures based on CNNs, LSTMs, and transformers, and explore the efficacy of unsupervised pre-training experimentally on large-scale telemetry data. We demonstrate how this system detects njRAT and other malware based on behavioral patterns.

* This is a pre-print of an article to appear in Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2021

Via

Access Paper or Ask Questions