Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Exploring the Long-Term Generalization of Counting Behavior in RNNs

Nov 29, 2022

Nadine El-Naggar, Pranava Madhyastha, Tillman Weyde

Figure 1 for Exploring the Long-Term Generalization of Counting Behavior in RNNs

Figure 2 for Exploring the Long-Term Generalization of Counting Behavior in RNNs

Figure 3 for Exploring the Long-Term Generalization of Counting Behavior in RNNs

Figure 4 for Exploring the Long-Term Generalization of Counting Behavior in RNNs

Share this with someone who'll enjoy it:

Abstract:In this study, we investigate the generalization of LSTM, ReLU and GRU models on counting tasks over long sequences. Previous theoretical work has established that RNNs with ReLU activation and LSTMs have the capacity for counting with suitable configuration, while GRUs have limitations that prevent correct counting over longer sequences. Despite this and some positive empirical results for LSTMs on Dyck-1 languages, our experimental results show that LSTMs fail to learn correct counting behavior for sequences that are significantly longer than in the training data. ReLUs show much larger variance in behavior and in most cases worse generalization. The long sequence generalization is empirically related to validation loss, but reliable long sequence generalization seems not practically achievable through backpropagation with current techniques. We demonstrate different failure modes for LSTMs, GRUs and ReLUs. In particular, we observe that the saturation of activation functions in LSTMs and the correct weight setting for ReLUs to generalize counting behavior are not achieved in standard training regimens. In summary, learning generalizable counting behavior is still an open problem and we discuss potential approaches for further research.

* Published in I Can't Believe It's Not Better: Understanding Deep Learning Through Empirical Falsification Workshop at NeurIPS 2022

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Exploring the Long-Term Generalization of Counting Behavior in RNNs

Paper and Code