Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture

Apr 11, 2023

Peiyu Liu, Ze-Feng Gao, Yushuo Chen, Wayne Xin Zhao, Ji-Rong Wen

Figure 1 for Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture

Figure 2 for Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture

Figure 3 for Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture

Figure 4 for Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture

Share this with someone who'll enjoy it:

Abstract:In this paper, we propose a highly parameter-efficient approach to scaling pre-trained language models (PLMs) to a deeper model depth. Unlike prior work that shares all parameters or uses extra blocks, we design a more capable parameter-sharing architecture based on matrix product operator (MPO). MPO decomposition can reorganize and factorize the information of a parameter matrix into two parts: the major part that contains the major information (central tensor) and the supplementary part that only has a small proportion of parameters (auxiliary tensors). Based on such a decomposition, our architecture shares the central tensor across all layers for reducing the model size and meanwhile keeps layer-specific auxiliary tensors (also using adapters) for enhancing the adaptation flexibility. To improve the model training, we further propose a stable initialization algorithm tailored for the MPO-based architecture. Extensive experiments have demonstrated the effectiveness of our proposed model in reducing the model size and achieving highly competitive performance.

* 14 pages, 4 figures, 6 tables

View paper on

Share this with someone who'll enjoy it:

Title:Scaling Pre-trained Language Models to Deeper via Parameter-efficient Architecture

Paper and Code