Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nathan Bartley

Characterizing Activity on the Deep and Dark Web

Mar 01, 2019

Nazgol Tavabi, Nathan Bartley, Andrés Abeliuk, Sandeep Soni, Emilio Ferrara, Kristina Lerman

Figure 1 for Characterizing Activity on the Deep and Dark Web

Figure 2 for Characterizing Activity on the Deep and Dark Web

Figure 3 for Characterizing Activity on the Deep and Dark Web

Figure 4 for Characterizing Activity on the Deep and Dark Web

Abstract:The deep and darkweb (d2web) refers to limited access web sites that require registration, authentication, or more complex encryption protocols to access them. These web sites serve as hubs for a variety of illicit activities: to trade drugs, stolen user credentials, hacking tools, and to coordinate attacks and manipulation campaigns. Despite its importance to cyber crime, the d2web has not been systematically investigated. In this paper, we study a large corpus of messages posted to 80 d2web forums over a period of more than a year. We identify topics of discussion using LDA and use a non-parametric HMM to model the evolution of topics across forums. Then, we examine the dynamic patterns of discussion and identify forums with similar patterns. We show that our approach surfaces hidden similarities across different forums and can help identify anomalous events in this rich, heterogeneous data.

Via

Access Paper or Ask Questions

Discovering Signals from Web Sources to Predict Cyber Attacks

Jun 08, 2018

Palash Goyal, KSM Tozammel Hossain, Ashok Deb, Nazgol Tavabi, Nathan Bartley, Andr'es Abeliuk, Emilio Ferrara, Kristina Lerman

Figure 1 for Discovering Signals from Web Sources to Predict Cyber Attacks

Figure 2 for Discovering Signals from Web Sources to Predict Cyber Attacks

Figure 3 for Discovering Signals from Web Sources to Predict Cyber Attacks

Figure 4 for Discovering Signals from Web Sources to Predict Cyber Attacks

Abstract:Cyber attacks are growing in frequency and severity. Over the past year alone we have witnessed massive data breaches that stole personal information of millions of people and wide-scale ransomware attacks that paralyzed critical infrastructure of several countries. Combating the rising cyber threat calls for a multi-pronged strategy, which includes predicting when these attacks will occur. The intuition driving our approach is this: during the planning and preparation stages, hackers leave digital traces of their activities on both the surface web and dark web in the form of discussions on platforms like hacker forums, social media, blogs and the like. These data provide predictive signals that allow anticipating cyber attacks. In this paper, we describe machine learning techniques based on deep neural networks and autoregressive time series models that leverage external signals from publicly available Web sources to forecast cyber attacks. Performance of our framework across ground truth data over real-world forecasting tasks shows that our methods yield a significant lift or increase of F1 for the top signals on predicted cyber attacks. Our results suggest that, when deployed, our system will be able to provide an effective line of defense against various types of targeted cyber attacks.

Via

Access Paper or Ask Questions

Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

Mar 20, 2016

Randal S. Olson, Nathan Bartley, Ryan J. Urbanowicz, Jason H. Moore

Figure 1 for Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

Figure 2 for Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

Figure 3 for Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

Figure 4 for Evaluation of a Tree-based Pipeline Optimization Tool for Automating Data Science

Abstract:As the field of data science continues to grow, there will be an ever-increasing demand for tools that make machine learning accessible to non-experts. In this paper, we introduce the concept of tree-based pipeline optimization for automating one of the most tedious parts of machine learning---pipeline design. We implement an open source Tree-based Pipeline Optimization Tool (TPOT) in Python and demonstrate its effectiveness on a series of simulated and real-world benchmark data sets. In particular, we show that TPOT can design machine learning pipelines that provide a significant improvement over a basic machine learning analysis while requiring little to no input nor prior knowledge from the user. We also address the tendency for TPOT to design overly complex pipelines by integrating Pareto optimization, which produces compact pipelines without sacrificing classification accuracy. As such, this work represents an important step toward fully automating machine learning pipeline design.

* 8 pages, 5 figures, preprint to appear in GECCO 2016, edits not yet made from reviewer comments

Via

Access Paper or Ask Questions