Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Matīss Apinis

Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences

Apr 06, 2020

Andis Draguns, Emīls Ozoliņš, Agris Šostaks, Matīss Apinis, Karlis Freivalds

Figure 1 for Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences

Figure 2 for Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences

Figure 3 for Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences

Figure 4 for Residual Shuffle-Exchange Networks for Fast Processing of Long Sequences

Abstract:Attention is a commonly used mechanism in sequence processing, but it is of O(n^2) complexity which prevents its application to long sequences. The recently introduced Neural Shuffle-Exchange network offers a computation-efficient alternative, enabling the modelling of long-range dependencies in O(n log n) time. The model, however, is quite complex, involving a sophisticated gating mechanism derived from Gated Recurrent Unit. In this paper, we present a simple and lightweight variant of the Shuffle-Exchange network, which is based on a residual network employing GELU and Layer Normalization. The proposed architecture not only scales to longer sequences but also converges faster and provides better accuracy. It surpasses Shuffle-Exchange network on the LAMBADA language modelling task and achieves state-of-the-art performance on the MusicNet dataset for music transcription while using significantly fewer parameters. We show how to combine Shuffle-Exchange network with convolutional layers establishing it as a useful building block in long sequence processing applications.

* arXiv admin note: text overlap with arXiv:1907.07897

Via

Access Paper or Ask Questions