Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dominic Seyler

Towards Dark Jargon Interpretation in Underground Forums

Nov 05, 2020

Dominic Seyler, Wei Liu, XiaoFeng Wang, ChengXiang Zhai

Figure 1 for Towards Dark Jargon Interpretation in Underground Forums

Figure 2 for Towards Dark Jargon Interpretation in Underground Forums

Abstract:Dark jargons are benign-looking words that have hidden, sinister meanings and are used by participants of underground forums for illicit behavior. For example, the dark term "rat" is often used in lieu of "Remote Access Trojan". In this work we present a novel method towards automatically identifying and interpreting dark jargons. We formalize the problem as a mapping from dark words to "clean" words with no hidden meaning. Our method makes use of interpretable representations of dark and clean words in the form of probability distributions over a shared vocabulary. In our experiments we show our method to be effective in terms of dark jargon identification, as it outperforms another related method on simulated data. Using manual evaluation, we show that our method is able to detect dark jargons in a real-world underground forum dataset.

Via

Access Paper or Ask Questions

Identifying Compromised Accounts on Social Media Using Statistical Text Analysis

Apr 19, 2018

Dominic Seyler, Lunan Li, ChengXiang Zhai

Figure 1 for Identifying Compromised Accounts on Social Media Using Statistical Text Analysis

Figure 2 for Identifying Compromised Accounts on Social Media Using Statistical Text Analysis

Figure 3 for Identifying Compromised Accounts on Social Media Using Statistical Text Analysis

Figure 4 for Identifying Compromised Accounts on Social Media Using Statistical Text Analysis

Abstract:Compromised social media accounts are legitimate user accounts that have been hijacked by a third (malicious) party and can cause various kinds of damage. Early detection of such compromised accounts is very important in order to control the damage. In this work we propose a novel general framework for discovering compromised accounts by utilizing statistical text analysis. The framework is built on the observation that users will use language that is measurably different from the language that a hacker (or spammer) would use, when the account is compromised. We use the framework to develop specific algorithms based on language modeling and use the similarity of language models of users and spammers as features in a supervised learning setup to identify compromised accounts. Evaluation results on a large Twitter corpus of over 129 million tweets show promising results of the proposed approach.

* 10 pages

Via

Access Paper or Ask Questions

KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition

Sep 11, 2017

Dominic Seyler, Tatiana Dembelova, Luciano Del Corro, Johannes Hoffart, Gerhard Weikum

Figure 1 for KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition

Figure 2 for KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition

Figure 3 for KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition

Figure 4 for KnowNER: Incremental Multilingual Knowledge in Named Entity Recognition

Abstract:KnowNER is a multilingual Named Entity Recognition (NER) system that leverages different degrees of external knowledge. A novel modular framework divides the knowledge into four categories according to the depth of knowledge they convey. Each category consists of a set of features automatically generated from different information sources (such as a knowledge-base, a list of names or document-specific semantic annotations) and is used to train a conditional random field (CRF). Since those information sources are usually multilingual, KnowNER can be easily trained for a wide range of languages. In this paper, we show that the incorporation of deeper knowledge systematically boosts accuracy and compare KnowNER with state-of-the-art NER approaches across three languages (i.e., English, German and Spanish) performing amongst state-of-the art systems in all of them.

Via

Access Paper or Ask Questions

Knowledge Questions from Knowledge Graphs

Nov 01, 2016

Dominic Seyler, Mohamed Yahya, Klaus Berberich

Figure 1 for Knowledge Questions from Knowledge Graphs

Figure 2 for Knowledge Questions from Knowledge Graphs

Figure 3 for Knowledge Questions from Knowledge Graphs

Figure 4 for Knowledge Questions from Knowledge Graphs

Abstract:We address the novel problem of automatically generating quiz-style knowledge questions from a knowledge graph such as DBpedia. Questions of this kind have ample applications, for instance, to educate users about or to evaluate their knowledge in a specific domain. To solve the problem, we propose an end-to-end approach. The approach first selects a named entity from the knowledge graph as an answer. It then generates a structured triple-pattern query, which yields the answer as its sole result. If a multiple-choice question is desired, the approach selects alternative answer options. Finally, our approach uses a template-based method to verbalize the structured query and yield a natural language question. A key challenge is estimating how difficult the generated question is to human users. To do this, we make use of historical data from the Jeopardy! quiz show and a semantically annotated Web-scale document collection, engineer suitable features, and train a logistic regression classifier to predict question difficulty. Experiments demonstrate the viability of our overall approach.

Via

Access Paper or Ask Questions