Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Maksim Glazkov

Detecting Generated Scientific Papers using an Ensemble of Transformer Models

Sep 17, 2022

Anna Glazkova, Maksim Glazkov

Figure 1 for Detecting Generated Scientific Papers using an Ensemble of Transformer Models

Figure 2 for Detecting Generated Scientific Papers using an Ensemble of Transformer Models

Figure 3 for Detecting Generated Scientific Papers using an Ensemble of Transformer Models

Figure 4 for Detecting Generated Scientific Papers using an Ensemble of Transformer Models

Abstract:The paper describes neural models developed for the DAGPap22 shared task hosted at the Third Workshop on Scholarly Document Processing. This shared task targets the automatic detection of generated scientific papers. Our work focuses on comparing different transformer-based models as well as using additional datasets and techniques to deal with imbalanced classes. As a final submission, we utilized an ensemble of SciBERT, RoBERTa, and DeBERTa fine-tuned using random oversampling technique. Our model achieved 99.24% in terms of F1-score. The official evaluation results have put our system at the third place.

* Accepted to SDP 2022 (Third Workshop on Scholarly Document Processing collocated with COLING 2022)

Via

Access Paper or Ask Questions

Fine-tuning of Pre-trained Transformers for Hate, Offensive, and Profane Content Detection in English and Marathi

Oct 25, 2021

Anna Glazkova, Michael Kadantsev, Maksim Glazkov

Figure 1 for Fine-tuning of Pre-trained Transformers for Hate, Offensive, and Profane Content Detection in English and Marathi

Figure 2 for Fine-tuning of Pre-trained Transformers for Hate, Offensive, and Profane Content Detection in English and Marathi

Figure 3 for Fine-tuning of Pre-trained Transformers for Hate, Offensive, and Profane Content Detection in English and Marathi

Figure 4 for Fine-tuning of Pre-trained Transformers for Hate, Offensive, and Profane Content Detection in English and Marathi

Abstract:This paper describes neural models developed for the Hate Speech and Offensive Content Identification in English and Indo-Aryan Languages Shared Task 2021. Our team called neuro-utmn-thales participated in two tasks on binary and fine-grained classification of English tweets that contain hate, offensive, and profane content (English Subtasks A & B) and one task on identification of problematic content in Marathi (Marathi Subtask A). For English subtasks, we investigate the impact of additional corpora for hate speech detection to fine-tune transformer models. We also apply a one-vs-rest approach based on Twitter-RoBERTa to discrimination between hate, profane and offensive posts. Our models ranked third in English Subtask A with the F1-score of 81.99% and ranked second in English Subtask B with the F1-score of 65.77%. For the Marathi tasks, we propose a system based on the Language-Agnostic BERT Sentence Embedding (LaBSE). This model achieved the second result in Marathi Subtask A obtaining an F1 of 88.08%.

* Accepted for FIRE'21: Forum for Information Retrieval Evaluation 2021

Via

Access Paper or Ask Questions

g2tmn at Constraint@AAAI2021: Exploiting CT-BERT and Ensembling Learning for COVID-19 Fake News Detection

Jan 13, 2021

Anna Glazkova, Maksim Glazkov, Timofey Trifonov

Figure 1 for g2tmn at Constraint@AAAI2021: Exploiting CT-BERT and Ensembling Learning for COVID-19 Fake News Detection

Figure 2 for g2tmn at Constraint@AAAI2021: Exploiting CT-BERT and Ensembling Learning for COVID-19 Fake News Detection

Figure 3 for g2tmn at Constraint@AAAI2021: Exploiting CT-BERT and Ensembling Learning for COVID-19 Fake News Detection

Figure 4 for g2tmn at Constraint@AAAI2021: Exploiting CT-BERT and Ensembling Learning for COVID-19 Fake News Detection

Abstract:The COVID-19 pandemic has had a huge impact on various areas of human life. Hence, the coronavirus pandemic and its consequences are being actively discussed on social media. However, not all social media posts are truthful. Many of them spread fake news that cause panic among readers, misinform people and thus exacerbate the effect of the pandemic. In this paper, we present our results at the Constraint@AAAI2021 Shared Task: COVID-19 Fake News Detection in English. In particular, we propose our approach using the transformer-based ensemble of COVID-Twitter-BERT (CT-BERT) models. We describe the models used, the ways of text preprocessing and adding extra data. As a result, our best model achieved the weighted F1-score of 98.69 on the test set (the first place in the leaderboard) of this shared task that attracted 166 submitted teams in total.

* The winning solution at the Constraint shared task (AAAI-2021)

Via

Access Paper or Ask Questions

A Comparative Study of Feature Types for Age-Based Text Classification

Sep 24, 2020

Anna Glazkova, Yury Egorov, Maksim Glazkov

Figure 1 for A Comparative Study of Feature Types for Age-Based Text Classification

Figure 2 for A Comparative Study of Feature Types for Age-Based Text Classification

Figure 3 for A Comparative Study of Feature Types for Age-Based Text Classification

Figure 4 for A Comparative Study of Feature Types for Age-Based Text Classification

Abstract:The ability to automatically determine the age audience of a novel provides many opportunities for the development of information retrieval tools. Firstly, developers of book recommendation systems and electronic libraries may be interested in filtering texts by the age of the most likely readers. Further, parents may want to select literature for children. Finally, it will be useful for writers and publishers to determine which features influence whether the texts are suitable for children. In this article, we compare the empirical effectiveness of various types of linguistic features for the task of age-based classification of fiction texts. For this purpose, we collected a text corpus of book previews labeled with one of two categories -- children's or adult. We evaluated the following types of features: readability indices, sentiment, lexical, grammatical and general features, and publishing attributes. The results obtained show that the features describing the text at the document level can significantly increase the quality of machine learning models.

* Accepted to AIST-2020 (The 9th International Conference on Analysis of Images, Social Networks and Texts)

Via

Access Paper or Ask Questions