Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Increasing The Performance of Cognitively Inspired Data-Efficient Language Models via Implicit Structure Building

Oct 31, 2023

Omar Momen, David Arps, Laura Kallmeyer

Figure 1 for Increasing The Performance of Cognitively Inspired Data-Efficient Language Models via Implicit Structure Building

Figure 2 for Increasing The Performance of Cognitively Inspired Data-Efficient Language Models via Implicit Structure Building

Figure 3 for Increasing The Performance of Cognitively Inspired Data-Efficient Language Models via Implicit Structure Building

Figure 4 for Increasing The Performance of Cognitively Inspired Data-Efficient Language Models via Implicit Structure Building

Share this with someone who'll enjoy it:

Abstract:In this paper, we describe our submission to the BabyLM Challenge 2023 shared task on data-efficient language model (LM) pretraining (Warstadt et al., 2023). We train transformer-based masked language models that incorporate unsupervised predictions about hierarchical sentence structure into the model architecture. Concretely, we use the Structformer architecture (Shen et al., 2021) and variants thereof. StructFormer models have been shown to perform well on unsupervised syntactic induction based on limited pretraining data, and to yield performance improvements over a vanilla transformer architecture (Shen et al., 2021). Evaluation of our models on 39 tasks provided by the BabyLM challenge shows promising improvements of models that integrate a hierarchical bias into the architecture at some particular tasks, even though they fail to consistently outperform the RoBERTa baseline model provided by the shared task organizers on all tasks.

* Accepted at the BabyLM shared task at CoNLL 2023

View paper on

Share this with someone who'll enjoy it:

Title:Increasing The Performance of Cognitively Inspired Data-Efficient Language Models via Implicit Structure Building

Paper and Code