Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jeffery Sorensen

Towards a Unified Framework for Adaptable Problematic Content Detection via Continual Learning

Sep 29, 2023

Ali Omrani, Alireza S. Ziabari, Preni Golazizian, Jeffery Sorensen, Morteza Dehghani

Figure 1 for Towards a Unified Framework for Adaptable Problematic Content Detection via Continual Learning

Figure 2 for Towards a Unified Framework for Adaptable Problematic Content Detection via Continual Learning

Figure 3 for Towards a Unified Framework for Adaptable Problematic Content Detection via Continual Learning

Figure 4 for Towards a Unified Framework for Adaptable Problematic Content Detection via Continual Learning

Abstract:Detecting problematic content, such as hate speech, is a multifaceted and ever-changing task, influenced by social dynamics, user populations, diversity of sources, and evolving language. There has been significant efforts, both in academia and in industry, to develop annotated resources that capture various aspects of problematic content. Due to researchers' diverse objectives, the annotations are inconsistent and hence, reports of progress on detection of problematic content are fragmented. This pattern is expected to persist unless we consolidate resources considering the dynamic nature of the problem. We propose integrating the available resources, and leveraging their dynamic nature to break this pattern. In this paper, we introduce a continual learning benchmark and framework for problematic content detection comprising over 84 related tasks encompassing 15 annotation schemas from 8 sources. Our benchmark creates a novel measure of progress: prioritizing the adaptability of classifiers to evolving tasks over excelling in specific tasks. To ensure the continuous relevance of our framework, we designed it so that new tasks can easily be integrated into the benchmark. Our baseline results demonstrate the potential of continual learning in capturing the evolving content and adapting to novel manifestations of problematic content.

Via

Access Paper or Ask Questions

WikiConv: A Corpus of the Complete Conversational History of a Large Online Collaborative Community

Oct 31, 2018

Yiqing Hua, Cristian Danescu-Niculescu-Mizil, Dario Taraborelli, Nithum Thain, Jeffery Sorensen, Lucas Dixon

Figure 1 for WikiConv: A Corpus of the Complete Conversational History of a Large Online Collaborative Community

Figure 2 for WikiConv: A Corpus of the Complete Conversational History of a Large Online Collaborative Community

Figure 3 for WikiConv: A Corpus of the Complete Conversational History of a Large Online Collaborative Community

Figure 4 for WikiConv: A Corpus of the Complete Conversational History of a Large Online Collaborative Community

Abstract:We present a corpus that encompasses the complete history of conversations between contributors to Wikipedia, one of the largest online collaborative communities. By recording the intermediate states of conversations---including not only comments and replies, but also their modifications, deletions and restorations---this data offers an unprecedented view of online conversation. This level of detail supports new research questions pertaining to the process (and challenges) of large-scale online collaboration. We illustrate the corpus' potential with two case studies that highlight new perspectives on earlier work. First, we explore how a person's conversational behavior depends on how they relate to the discussion's venue. Second, we show that community moderation of toxic behavior happens at a higher rate than previously estimated. Finally the reconstruction framework is designed to be language agnostic, and we show that it can extract high quality conversational data in both Chinese and English.

* Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

Via

Access Paper or Ask Questions