Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Patrick Doetsch

A comprehensive study of batch construction strategies for recurrent neural networks in MXNet

May 05, 2017

Patrick Doetsch, Pavel Golik, Hermann Ney

Figure 1 for A comprehensive study of batch construction strategies for recurrent neural networks in MXNet

Figure 2 for A comprehensive study of batch construction strategies for recurrent neural networks in MXNet

Figure 3 for A comprehensive study of batch construction strategies for recurrent neural networks in MXNet

Abstract:In this work we compare different batch construction methods for mini-batch training of recurrent neural networks. While popular implementations like TensorFlow and MXNet suggest a bucketing approach to improve the parallelization capabilities of the recurrent training process, we propose a simple ordering strategy that arranges the training sequences in a stochastic alternatingly sorted way. We compare our method to sequence bucketing as well as various other batch construction strategies on the CHiME-4 noisy speech recognition corpus. The experiments show that our alternated sorting approach is able to compete both in training time and recognition performance while being conceptually simpler to implement.

Via

Access Paper or Ask Questions

A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic Modeling in Speech Recognition

Mar 29, 2017

Albert Zeyer, Patrick Doetsch, Paul Voigtlaender, Ralf Schlüter, Hermann Ney

Figure 1 for A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic Modeling in Speech Recognition

Figure 2 for A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic Modeling in Speech Recognition

Figure 3 for A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic Modeling in Speech Recognition

Figure 4 for A Comprehensive Study of Deep Bidirectional LSTM RNNs for Acoustic Modeling in Speech Recognition

Abstract:We present a comprehensive study of deep bidirectional long short-term memory (LSTM) recurrent neural network (RNN) based acoustic models for automatic speech recognition (ASR). We study the effect of size and depth and train models of up to 8 layers. We investigate the training aspect and study different variants of optimization methods, batching, truncated backpropagation, different regularization techniques such as dropout and $L_2$ regularization, and different gradient clipping variants. The major part of the experimental analysis was performed on the Quaero corpus. Additional experiments also were performed on the Switchboard corpus. Our best LSTM model has a relative improvement in word error rate of over 14\% compared to our best feed-forward neural network (FFNN) baseline on the Quaero task. On this task, we get our best result with an 8 layer bidirectional LSTM and we show that a pretraining scheme with layer-wise construction helps for deep LSTMs. Finally we compare the training calculation time of many of the presented experiments in relation with recognition performance. All the experiments were done with RETURNN, the RWTH extensible training framework for universal recurrent neural networks in combination with RASR, the RWTH ASR toolkit.

* published on ICASSP 2017 conference, New Orleans, USA

Via

Access Paper or Ask Questions

RETURNN: The RWTH Extensible Training framework for Universal Recurrent Neural Networks

Jan 10, 2017

Patrick Doetsch, Albert Zeyer, Paul Voigtlaender, Ilya Kulikov, Ralf Schlüter, Hermann Ney

Figure 1 for RETURNN: The RWTH Extensible Training framework for Universal Recurrent Neural Networks

Figure 2 for RETURNN: The RWTH Extensible Training framework for Universal Recurrent Neural Networks

Figure 3 for RETURNN: The RWTH Extensible Training framework for Universal Recurrent Neural Networks

Figure 4 for RETURNN: The RWTH Extensible Training framework for Universal Recurrent Neural Networks

Abstract:In this work we release our extensible and easily configurable neural network training software. It provides a rich set of functional layers with a particular focus on efficient training of recurrent neural network topologies on multiple GPUs. The source of the software package is public and freely available for academic research purposes and can be used as a framework or as a standalone tool which supports a flexible configuration. The software allows to train state-of-the-art deep bidirectional long short-term memory (LSTM) models on both one dimensional data like speech or two dimensional data like handwritten text and was used to develop successful submission systems in several evaluation campaigns.

Via

Access Paper or Ask Questions