Abstract:The eRisk laboratory aims to address issues related to early risk detection on the Web. In this year's edition, three tasks were proposed, where Task 2 was about early detection of signs of anorexia. Early risk detection is a problem where precision and speed are two crucial objectives. Our research group solved Task 2 by defining a CPI+DMC approach, addressing both objectives independently, and a time-aware approach, where precision and speed are considered a combined single-objective. We implemented the last approach by explicitly integrating time during the learning process, considering the ERDE{\theta} metric as the training objective. It also allowed us to incorporate temporal metrics to validate and select the optimal models. We achieved outstanding results for the ERDE50 metric and ranking-based metrics, demonstrating consistency in solving ERD problems.
Abstract:MentalRiskES is a novel challenge that proposes to solve problems related to early risk detection for the Spanish language. The objective is to detect, as soon as possible, Telegram users who show signs of mental disorders considering different tasks. Task 1 involved the users' detection of eating disorders, Task 2 focused on depression detection, and Task 3 aimed at detecting an unknown disorder. These tasks were divided into subtasks, each one defining a resolution approach. Our research group participated in subtask A for Tasks 1 and 2: a binary classification problem that evaluated whether the users were positive or negative. To solve these tasks, we proposed models based on Transformers followed by a decision policy according to criteria defined by an early detection framework. One of the models presented an extended vocabulary with important words for each task to be solved. In addition, we applied a decision policy based on the history of predictions that the model performs during user evaluation. For Tasks 1 and 2, we obtained the second-best performance according to rankings based on classification and latency, demonstrating the effectiveness and consistency of our approaches for solving early detection problems in the Spanish language.
Abstract:The CLEF eRisk Laboratory explores solutions to different tasks related to risk detection on the Internet. In the 2023 edition, Task 1 consisted of searching for symptoms of depression, the objective of which was to extract user writings according to their relevance to the BDI Questionnaire symptoms. Task 2 was related to the problem of early detection of pathological gambling risks, where the participants had to detect users at risk as quickly as possible. Finally, Task 3 consisted of estimating the severity levels of signs of eating disorders. Our research group participated in the first two tasks, proposing solutions based on Transformers. For Task 1, we applied different approaches that can be interesting in information retrieval tasks. Two proposals were based on the similarity of contextualized embedding vectors, and the other one was based on prompting, an attractive current technique of machine learning. For Task 2, we proposed three fine-tuned models followed by decision policy according to criteria defined by an early detection framework. One model presented extended vocabulary with important words to the addressed domain. In the last task, we obtained good performances considering the decision-based metrics, ranking-based metrics, and runtime. In this work, we explore different ways to deploy the predictive potential of Transformers in eRisk tasks.
Abstract:A recently introduced text classifier, called SS3, has obtained state-of-the-art performance on the CLEF's eRisk tasks. SS3 was created to deal with risk detection over text streams and therefore not only supports incremental training and classification but also can visually explain its rationale. However, little attention has been paid to the potential use of SS3 as a general classifier. We believe this could be due to the unavailability of an open-source implementation of SS3. In this work, we introduce PySS3, a package that not only implements SS3 but also comes with visualization tools that allow researchers deploying robust, explainable and trusty machine learning models for text classification.
Abstract:A recently introduced classifier, called SS3, has shown to be well suited to deal with early risk detection (ERD) problems on text streams. It obtained state-of-the-art performance on early depression and anorexia detection on Reddit in the CLEF's eRisk open tasks. SS3 was created to naturally deal with ERD problems since: it supports incremental training and classification over text streams and it can visually explain its rationale. However, SS3 processes the input using a bag-of-word model lacking the ability to recognize important word sequences. This could negatively affect the classification performance and also reduces the descriptiveness of visual explanations. In the standard document classification field, it is very common to use word n-grams to try to overcome some of these limitations. Unfortunately, when working with text streams, using n-grams is not trivial since the system must learn and recognize which n-grams are important ``on the fly''. This paper introduces t-SS3, a variation of SS3 which expands the model to dynamically recognize useful patterns over text streams. We evaluated our model on the eRisk 2017 and 2018 tasks on early depression and anorexia detection. Experimental results show that t-SS3 is able to improve both, existing results and the richness of visual explanations.
Abstract:With the rise of the Internet, there is a growing need to build intelligent systems that are capable of efficiently dealing with early risk detection (ERD) problems on social media, such as early depression detection, early rumor detection or identification of sexual predators. These systems, nowadays mostly based on machine learning techniques, must be able to deal with data streams since users provide their data over time. In addition, these systems must be able to decide when the processed data is sufficient to actually classify users. Moreover, since ERD tasks involve risky decisions by which people's lives could be affected, such systems must also be able to justify their decisions. However, most standard and state-of-the-art supervised machine learning models (such as SVM, MNB, Neural Networks, etc.) are not well suited to deal with this scenario. This is due to the fact that they either act as black boxes or do not support incremental classification/learning. In this paper we introduce SS3, a novel supervised learning model for text classification that naturally supports these aspects. SS3 was designed to be used as a general framework to deal with ERD problems. We evaluated our model on the CLEF's eRisk2017 pilot task on early depression detection. Most of the 30 contributions submitted to this competition used state-of-the-art methods. Experimental results show that our classifier was able to outperform these models and standard classifiers, despite being less computationally expensive and having the ability to explain its rationale.