Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Eric L. Goodman

Packet2Vec: Utilizing Word2Vec for Feature Extraction in Packet Data

Apr 29, 2020

Eric L. Goodman, Chase Zimmerman, Corey Hudson

Figure 1 for Packet2Vec: Utilizing Word2Vec for Feature Extraction in Packet Data

Figure 2 for Packet2Vec: Utilizing Word2Vec for Feature Extraction in Packet Data

Figure 3 for Packet2Vec: Utilizing Word2Vec for Feature Extraction in Packet Data

Figure 4 for Packet2Vec: Utilizing Word2Vec for Feature Extraction in Packet Data

Abstract:One of deep learning's attractive benefits is the ability to automatically extract relevant features for a target problem from largely raw data, instead of utilizing human engineered and error prone handcrafted features. While deep learning has shown success in fields such as image classification and natural language processing, its application for feature extraction on raw network packet data for intrusion detection is largely unexplored. In this paper we modify a Word2Vec approach, used for text processing, and apply it to packet data for automatic feature extraction. We call this approach Packet2Vec. For the classification task of benign versus malicious traffic on a 2009 DARPA network data set, we obtain an area under the curve (AUC) of the receiver operating characteristic (ROC) between 0.988-0.996 and an AUC of the Precision/Recall curve between 0.604-0.667.

* MLDM 2019

Via

Access Paper or Ask Questions

A Streaming Analytics Language for Processing Cyber Data

Nov 03, 2019

Eric L. Goodman, Dirk Grunwald

Figure 1 for A Streaming Analytics Language for Processing Cyber Data

Figure 2 for A Streaming Analytics Language for Processing Cyber Data

Figure 3 for A Streaming Analytics Language for Processing Cyber Data

Abstract:We present a domain-specific language called SAL(the Streaming Analytics Language) for processing data in a semi-streaming model. In particular we examine the use case of processing netflow data in order to identify malicious actors within a network. Because of the large volume of data generated from networks, it is often only feasible to process the data with a single pass, utilizing a streaming (O(polylog n) space requirements) or semi-streaming computing model ( O(n polylog n) space requirements). Despite these constraints, we are able to achieve an average of 0.87 for the AUC of the ROC curve for a set of situations dealing with botnet detection. The implementation of an interpreter for SAL, which we call SAM (Streaming Analytics Machine), achieves scaling results that show improved throughput to 61 nodes (976 cores), with an overall rate of 373,000 netflows per second or 32.2 billion per day. SAL provides a succinct way to describe common analyses that allow cyber analysts to find data of interest, and SAM is a scalable interpreter of the language.

* Machine Learning and Data Mining 2019

Via

Access Paper or Ask Questions