Abstract:With the rise of voice chat rooms, a gigantic resource of data can be exposed to the research community for natural language processing tasks. Moderators in voice chat rooms actively monitor the discussions and remove the participants with offensive language. However, it makes the hate speech detection even more difficult since some participants try to find creative ways to articulate hate speech. This makes the hate speech detection challenging in new social media like Clubhouse. To the best of our knowledge all the hate speech datasets have been collected from text resources like Twitter. In this paper, we take the first step to collect a significant dataset from Clubhouse as the rising star in social media industry. We analyze the collected instances from statistical point of view using the Google Perspective Scores. Our experiments show that, the Perspective Scores can outperform Bag of Words and Word2Vec as high level text features.
Abstract:Web services are software systems designed for supporting interoperable dynamic cross-enterprise interactions. The result of attacks to Web services can be catastrophic and causing the disclosure of enterprises' confidential data. As new approaches of attacking arise every day, anomaly detection systems seem to be invaluable tools in this context. The aim of this work has been to target the attacks that reside in the Web service layer and the extensible markup language (XML)-structured simple object access protocol (SOAP) messages. After studying the shortcomings of the existing solutions, a new approach for detecting anomalies in Web services is outlined. More specifically, the proposed technique illustrates how to identify anomalies by employing mining methods on XML-structured SOAP messages. This technique also takes the advantages of tree-based association rule mining to extract knowledge in the training phase, which is used in the test phase to detect anomalies. In addition, this novel composition of techniques brings nearly low false alarm rate while maintaining the detection rate reasonably high, which is shown by a case study.