Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Ouroboros: On Accelerating Training of Transformer-Based Language Models

Sep 14, 2019

Qian Yang, Zhouyuan Huo, Wenlin Wang, Heng Huang, Lawrence Carin

Figure 1 for Ouroboros: On Accelerating Training of Transformer-Based Language Models

Figure 2 for Ouroboros: On Accelerating Training of Transformer-Based Language Models

Figure 3 for Ouroboros: On Accelerating Training of Transformer-Based Language Models

Figure 4 for Ouroboros: On Accelerating Training of Transformer-Based Language Models

Share this with someone who'll enjoy it:

Abstract:Language models are essential for natural language processing (NLP) tasks, such as machine translation and text summarization. Remarkable performance has been demonstrated recently across many NLP domains via a Transformer-based language model with over a billion parameters, verifying the benefits of model size. Model parallelism is required if a model is too large to fit in a single computing device. Current methods for model parallelism either suffer from backward locking in backpropagation or are not applicable to language models. We propose the first model-parallel algorithm that speeds the training of Transformer-based language models. We also prove that our proposed algorithm is guaranteed to converge to critical points for non-convex problems. Extensive experiments on Transformer and Transformer-XL language models demonstrate that the proposed algorithm obtains a much faster speedup beyond data parallelism, with comparable or better accuracy. Code to reproduce experiments is to be found at \url{https://github.com/LaraQianYang/Ouroboros}.

* To appear in the proceedings of Neural Information Processing Systems Conference (2019)

View paper on

Share this with someone who'll enjoy it:

Title:Ouroboros: On Accelerating Training of Transformer-Based Language Models

Paper and Code