Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Reconsidering Token Embeddings with the Definitions for Pre-trained Language Models

Aug 02, 2024

Ying Zhang, Dongyuan Li, Manabu Okumura

Figure 1 for Reconsidering Token Embeddings with the Definitions for Pre-trained Language Models

Figure 2 for Reconsidering Token Embeddings with the Definitions for Pre-trained Language Models

Figure 3 for Reconsidering Token Embeddings with the Definitions for Pre-trained Language Models

Figure 4 for Reconsidering Token Embeddings with the Definitions for Pre-trained Language Models

Share this with someone who'll enjoy it:

Abstract:Learning token embeddings based on token co-occurrence statistics has proven effective for both pre-training and fine-tuning in natural language processing. However, recent studies have pointed out the distribution of learned embeddings degenerates into anisotropy, and even pre-trained language models (PLMs) suffer from a loss of semantics-related information in embeddings for low-frequency tokens. This study first analyzes fine-tuning dynamics of a PLM, BART-large, and demonstrates its robustness against degeneration. On the basis of this finding, we propose DefinitionEMB, a method that utilizes definitions to construct isotropically distributed and semantics-related token embeddings for PLMs while maintaining original robustness during fine-tuning. Our experiments demonstrate the effectiveness of leveraging definitions from Wiktionary to construct such embeddings for RoBERTa-base and BART-large. Furthermore, the constructed embeddings for low-frequency tokens improve the performance of these models across various GLUE and four text summarization datasets.

View paper on

Share this with someone who'll enjoy it:

Title:Reconsidering Token Embeddings with the Definitions for Pre-trained Language Models

Paper and Code