Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Comparing Performance of Different Linguistically-Backed Word Embeddings for Cyberbullying Detection

Jun 04, 2022

Juuso Eronen, Michal Ptaszynski, Fumito Masui

Share this with someone who'll enjoy it:

Abstract:In most cases, word embeddings are learned only from raw tokens or in some cases, lemmas. This includes pre-trained language models like BERT. To investigate on the potential of capturing deeper relations between lexical items and structures and to filter out redundant information, we propose to preserve the morphological, syntactic and other types of linguistic information by combining them with the raw tokens or lemmas. This means, for example, including parts-of-speech or dependency information within the used lexical features. The word embeddings can then be trained on the combinations instead of just raw tokens. It is also possible to later apply this method to the pre-training of huge language models and possibly enhance their performance. This would aid in tackling problems which are more sophisticated from the point of view of linguistic representation, such as detection of cyberbullying.

* Proceedings of the 2021 International Workshop on Modern Science and Technology, September 29, 2021

View paper on

Share this with someone who'll enjoy it:

Title:Comparing Performance of Different Linguistically-Backed Word Embeddings for Cyberbullying Detection

Paper and Code