Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nadine El-Naggar

Which Word Orders Facilitate Length Generalization in LMs? An Investigation with GCG-Based Artificial Languages

Oct 14, 2025

Nadine El-Naggar, Tatsuki Kuribayashi, Ted Briscoe

Figure 1 for Which Word Orders Facilitate Length Generalization in LMs? An Investigation with GCG-Based Artificial Languages

Figure 2 for Which Word Orders Facilitate Length Generalization in LMs? An Investigation with GCG-Based Artificial Languages

Figure 3 for Which Word Orders Facilitate Length Generalization in LMs? An Investigation with GCG-Based Artificial Languages

Figure 4 for Which Word Orders Facilitate Length Generalization in LMs? An Investigation with GCG-Based Artificial Languages

Abstract:Whether language models (LMs) have inductive biases that favor typologically frequent grammatical properties over rare, implausible ones has been investigated, typically using artificial languages (ALs) (White and Cotterell, 2021; Kuribayashi et al., 2024). In this paper, we extend these works from two perspectives. First, we extend their context-free AL formalization by adopting Generalized Categorial Grammar (GCG) (Wood, 2014), which allows ALs to cover attested but previously overlooked constructions, such as unbounded dependency and mildly context-sensitive structures. Second, our evaluation focuses more on the generalization ability of LMs to process unseen longer test sentences. Thus, our ALs better capture features of natural languages and our experimental paradigm leads to clearer conclusions -- typologically plausible word orders tend to be easier for LMs to productively generalize.

* EMNLP 2025 Main Conference

Via

Access Paper or Ask Questions

Theoretical Conditions and Empirical Failure of Bracket Counting on Long Sequences with Linear Recurrent Networks

Apr 07, 2023

Nadine El-Naggar, Pranava Madhyastha, Tillman Weyde

Figure 1 for Theoretical Conditions and Empirical Failure of Bracket Counting on Long Sequences with Linear Recurrent Networks

Figure 2 for Theoretical Conditions and Empirical Failure of Bracket Counting on Long Sequences with Linear Recurrent Networks

Figure 3 for Theoretical Conditions and Empirical Failure of Bracket Counting on Long Sequences with Linear Recurrent Networks

Abstract:Previous work has established that RNNs with an unbounded activation function have the capacity to count exactly. However, it has also been shown that RNNs are challenging to train effectively and generally do not learn exact counting behaviour. In this paper, we focus on this problem by studying the simplest possible RNN, a linear single-cell network. We conduct a theoretical analysis of linear RNNs and identify conditions for the models to exhibit exact counting behaviour. We provide a formal proof that these conditions are necessary and sufficient. We also conduct an empirical analysis using tasks involving a Dyck-1-like Balanced Bracket language under two different settings. We observe that linear RNNs generally do not meet the necessary and sufficient conditions for counting behaviour when trained with the standard approach. We investigate how varying the length of training sequences and utilising different target classes impacts model behaviour during training and the ability of linear RNN models to effectively approximate the indicator conditions.

* 17th Conference of the European Chapter of the Association for Computational Linguistics Student Research Workshop (EACL 2023 SRW)

Via

Access Paper or Ask Questions

Exploring the Long-Term Generalization of Counting Behavior in RNNs

Nov 29, 2022

Nadine El-Naggar, Pranava Madhyastha, Tillman Weyde

Figure 1 for Exploring the Long-Term Generalization of Counting Behavior in RNNs

Figure 2 for Exploring the Long-Term Generalization of Counting Behavior in RNNs

Figure 3 for Exploring the Long-Term Generalization of Counting Behavior in RNNs

Figure 4 for Exploring the Long-Term Generalization of Counting Behavior in RNNs

Abstract:In this study, we investigate the generalization of LSTM, ReLU and GRU models on counting tasks over long sequences. Previous theoretical work has established that RNNs with ReLU activation and LSTMs have the capacity for counting with suitable configuration, while GRUs have limitations that prevent correct counting over longer sequences. Despite this and some positive empirical results for LSTMs on Dyck-1 languages, our experimental results show that LSTMs fail to learn correct counting behavior for sequences that are significantly longer than in the training data. ReLUs show much larger variance in behavior and in most cases worse generalization. The long sequence generalization is empirically related to validation loss, but reliable long sequence generalization seems not practically achievable through backpropagation with current techniques. We demonstrate different failure modes for LSTMs, GRUs and ReLUs. In particular, we observe that the saturation of activation functions in LSTMs and the correct weight setting for ReLUs to generalize counting behavior are not achieved in standard training regimens. In summary, learning generalizable counting behavior is still an open problem and we discuss potential approaches for further research.

* Published in I Can't Believe It's Not Better: Understanding Deep Learning Through Empirical Falsification Workshop at NeurIPS 2022

Via

Access Paper or Ask Questions