Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ahmed Alajrami

Understanding the Role of Input Token Characters in Language Models: How Does Information Loss Affect Performance?

Oct 26, 2023

Ahmed Alajrami, Katerina Margatina, Nikolaos Aletras

Figure 1 for Understanding the Role of Input Token Characters in Language Models: How Does Information Loss Affect Performance?

Figure 2 for Understanding the Role of Input Token Characters in Language Models: How Does Information Loss Affect Performance?

Figure 3 for Understanding the Role of Input Token Characters in Language Models: How Does Information Loss Affect Performance?

Figure 4 for Understanding the Role of Input Token Characters in Language Models: How Does Information Loss Affect Performance?

Abstract:Understanding how and what pre-trained language models (PLMs) learn about language is an open challenge in natural language processing. Previous work has focused on identifying whether they capture semantic and syntactic information, and how the data or the pre-training objective affects their performance. However, to the best of our knowledge, no previous work has specifically examined how information loss in input token characters affects the performance of PLMs. In this study, we address this gap by pre-training language models using small subsets of characters from individual tokens. Surprisingly, we find that pre-training even under extreme settings, i.e. using only one character of each token, the performance retention in standard NLU benchmarks and probing tasks compared to full-token models is high. For instance, a model pre-trained only on single first characters from tokens achieves performance retention of approximately $90$\% and $77$\% of the full-token model in SuperGLUE and GLUE tasks, respectively.

* To appear at EMNLP 2023

Via

Access Paper or Ask Questions

How does the pre-training objective affect what large language models learn about linguistic properties?

Mar 20, 2022

Ahmed Alajrami, Nikolaos Aletras

Figure 1 for How does the pre-training objective affect what large language models learn about linguistic properties?

Figure 2 for How does the pre-training objective affect what large language models learn about linguistic properties?

Figure 3 for How does the pre-training objective affect what large language models learn about linguistic properties?

Figure 4 for How does the pre-training objective affect what large language models learn about linguistic properties?

Abstract:Several pre-training objectives, such as masked language modeling (MLM), have been proposed to pre-train language models (e.g. BERT) with the aim of learning better language representations. However, to the best of our knowledge, no previous work so far has investigated how different pre-training objectives affect what BERT learns about linguistics properties. We hypothesize that linguistically motivated objectives such as MLM should help BERT to acquire better linguistic knowledge compared to other non-linguistically motivated objectives that are not intuitive or hard for humans to guess the association between the input and the label to be predicted. To this end, we pre-train BERT with two linguistically motivated objectives and three non-linguistically motivated ones. We then probe for linguistic characteristics encoded in the representation of the resulting models. We find strong evidence that there are only small differences in probing performance between the representations learned by the two different types of objectives. These surprising results question the dominant narrative of linguistically informed pre-training.

* Accepted at ACL 2022

Via

Access Paper or Ask Questions