Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lukas Machlica

Decision-forest voting scheme for classification of rare classes in network intrusion detection

Jul 25, 2021

Jan Brabec, Lukas Machlica

Figure 1 for Decision-forest voting scheme for classification of rare classes in network intrusion detection

Figure 2 for Decision-forest voting scheme for classification of rare classes in network intrusion detection

Figure 3 for Decision-forest voting scheme for classification of rare classes in network intrusion detection

Figure 4 for Decision-forest voting scheme for classification of rare classes in network intrusion detection

Abstract:In this paper, Bayesian based aggregation of decision trees in an ensemble (decision forest) is investigated. The focus is laid on multi-class classification with number of samples significantly skewed toward one of the classes. The algorithm leverages out-of-bag datasets to estimate prediction errors of individual trees, which are then used in accordance with the Bayes rule to refine the decision of the ensemble. The algorithm takes prevalence of individual classes into account and does not require setting of any additional parameters related to class weights or decision-score thresholds. Evaluation is based on publicly available datasets as well as on an proprietary dataset comprising network traffic telemetry from hundreds of enterprise networks with over a million of users overall. The aim is to increase the detection capabilities of an operating malware detection system. While we were able to keep precision of the system higher than 94\%, that is only 6 out of 100 detections shown to the network administrator are false alarms, we were able to achieve increase of approximately 7\% in the number of detections. The algorithm effectively handles large amounts of data, and can be used in conjunction with most of the state-of-the-art algorithms used to train decision forests.

* 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2018, pp. 3325-3330
* \copyright 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Via

Access Paper or Ask Questions

Joint Detection of Malicious Domains and Infected Clients

Jun 21, 2019

Paul Prasse, Rene Knaebel, Lukas Machlica, Tomas Pevny, Tobias Scheffer

Figure 1 for Joint Detection of Malicious Domains and Infected Clients

Figure 2 for Joint Detection of Malicious Domains and Infected Clients

Figure 3 for Joint Detection of Malicious Domains and Infected Clients

Figure 4 for Joint Detection of Malicious Domains and Infected Clients

Abstract:Detection of malware-infected computers and detection of malicious web domains based on their encrypted HTTPS traffic are challenging problems, because only addresses, timestamps, and data volumes are observable. The detection problems are coupled, because infected clients tend to interact with malicious domains. Traffic data can be collected at a large scale, and antivirus tools can be used to identify infected clients in retrospect. Domains, by contrast, have to be labeled individually after forensic analysis. We explore transfer learning based on sluice networks; this allows the detection models to bootstrap each other. In a large-scale experimental study, we find that the model outperforms known reference models and detects previously unknown malware, previously unknown malware families, and previously unknown malicious domains.

* Mach Learn (2019)

Via

Access Paper or Ask Questions

Bad practices in evaluation methodology relevant to class-imbalanced problems

Dec 04, 2018

Jan Brabec, Lukas Machlica

Figure 1 for Bad practices in evaluation methodology relevant to class-imbalanced problems

Abstract:For research to go in the right direction, it is essential to be able to compare and quantify performance of different algorithms focused on the same problem. Choosing a suitable evaluation metric requires deep understanding of the pursued task along with all of its characteristics. We argue that in the case of applied machine learning, proper evaluation metric is the basic building block that should be in the spotlight and put under thorough examination. Here, we address tasks with class imbalance, in which the class of interest is the one with much lower number of samples. We encountered non-insignificant amount of recent papers, in which improper evaluation methods are used, borrowed mainly from the field of balanced problems. Such bad practices may heavily bias the results in favour of inappropriate algorithms and give false expectations of the state of the field.

* Accepted to Critiquing and Correcting Trends in Machine Learning workshop at NeurIPS 2018 (https://ml-critique-correct.github.io/)

Via

Access Paper or Ask Questions

Learning detectors of malicious web requests for intrusion detection in network traffic

Feb 08, 2017

Lukas Machlica, Karel Bartos, Michal Sofka

Figure 1 for Learning detectors of malicious web requests for intrusion detection in network traffic

Figure 2 for Learning detectors of malicious web requests for intrusion detection in network traffic

Figure 3 for Learning detectors of malicious web requests for intrusion detection in network traffic

Figure 4 for Learning detectors of malicious web requests for intrusion detection in network traffic

Abstract:This paper proposes a generic classification system designed to detect security threats based on the behavior of malware samples. The system relies on statistical features computed from proxy log fields to train detectors using a database of malware samples. The behavior detectors serve as basic reusable building blocks of the multi-level detection architecture. The detectors identify malicious communication exploiting encrypted URL strings and domains generated by a Domain Generation Algorithm (DGA) which are frequently used in Command and Control (C&C), phishing, and click fraud. Surprisingly, very precise detectors can be built given only a limited amount of information extracted from a single proxy log. This way, the computational requirements of the detectors are kept low which allows for deployment on a wide range of security devices and without depending on traffic context such as DNS logs, Whois records, webpage content, etc. Results on several weeks of live traffic from 100+ companies having 350k+ hosts show correct detection with a precision exceeding 95% of malicious flows, 95% of malicious URLs and 90% of infected hosts. In addition, a comparison with a signature and rule-based solution shows that our system is able to detect significant amount of new threats.

Via

Access Paper or Ask Questions