Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Francesca Fallucchi

The Dark Side of the Language: Pre-trained Transformers in the DarkNet

Feb 09, 2022

Leonardo Ranaldi, Aria Nourbakhsh, Arianna Patrizi, Elena Sofia Ruzzetti, Dario Onorati, Francesca Fallucchi, Fabio Massimo Zanzotto

Figure 1 for The Dark Side of the Language: Pre-trained Transformers in the DarkNet

Figure 2 for The Dark Side of the Language: Pre-trained Transformers in the DarkNet

Figure 3 for The Dark Side of the Language: Pre-trained Transformers in the DarkNet

Abstract:Pre-trained Transformers are challenging human performances in many natural language processing tasks. The gigantic datasets used for pre-training seem to be the key for their success on existing tasks. In this paper, we explore how a range of pre-trained natural language understanding models perform on truly novel and unexplored data, provided by classification tasks over a DarkNet corpus. Surprisingly, results show that syntactic and lexical neural networks largely outperform pre-trained Transformers. This seems to suggest that pre-trained Transformers have serious difficulties in adapting to radically novel texts.

Via

Access Paper or Ask Questions

Lacking the embedding of a word? Look it up into a traditional dictionary

Sep 24, 2021

Elena Sofia Ruzzetti, Leonardo Ranaldi, Michele Mastromattei, Francesca Fallucchi, Fabio Massimo Zanzotto

Figure 1 for Lacking the embedding of a word? Look it up into a traditional dictionary

Figure 2 for Lacking the embedding of a word? Look it up into a traditional dictionary

Figure 3 for Lacking the embedding of a word? Look it up into a traditional dictionary

Figure 4 for Lacking the embedding of a word? Look it up into a traditional dictionary

Abstract:Word embeddings are powerful dictionaries, which may easily capture language variations. However, these dictionaries fail to give sense to rare words, which are surprisingly often covered by traditional dictionaries. In this paper, we propose to use definitions retrieved in traditional dictionaries to produce word embeddings for rare words. For this purpose, we introduce two methods: Definition Neural Network (DefiNNet) and Define BERT (DefBERT). In our experiments, DefiNNet and DefBERT significantly outperform state-of-the-art as well as baseline methods devised for producing embeddings of unknown words. In fact, DefiNNet significantly outperforms FastText, which implements a method for the same task-based on n-grams, and DefBERT significantly outperforms the BERT method for OOV words. Then, definitions in traditional dictionaries are useful to build word embeddings for rare words.

Via

Access Paper or Ask Questions