Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yi-Hsiu Liao

Learning Language-Specific Layers for Multilingual Machine Translation

May 04, 2023

Telmo Pessoa Pires, Robin M. Schmidt, Yi-Hsiu Liao, Stephan Peitz

Abstract:Multilingual Machine Translation promises to improve translation quality between non-English languages. This is advantageous for several reasons, namely lower latency (no need to translate twice), and reduced error cascades (e.g., avoiding losing gender and formality information when translating through English). On the downside, adding more languages reduces model capacity per language, which is usually countered by increasing the overall model size, making training harder and inference slower. In this work, we introduce Language-Specific Transformer Layers (LSLs), which allow us to increase model capacity, while keeping the amount of computation and the number of parameters used in the forward pass constant. The key idea is to have some layers of the encoder be source or target language-specific, while keeping the remaining layers shared. We study the best way to place these layers using a neural architecture search inspired approach, and achieve an improvement of 1.3 chrF (1.5 spBLEU) points over not using LSLs on a separate decoder architecture, and 1.9 chrF (2.2 spBLEU) on a shared decoder one.

* Accepted at ACL 2023

Via

Access Paper or Ask Questions

Efficient Inference For Neural Machine Translation

Oct 07, 2020

Yi-Te Hsu, Sarthak Garg, Yi-Hsiu Liao, Ilya Chatsviorkin

Figure 1 for Efficient Inference For Neural Machine Translation

Figure 2 for Efficient Inference For Neural Machine Translation

Figure 3 for Efficient Inference For Neural Machine Translation

Figure 4 for Efficient Inference For Neural Machine Translation

Abstract:Large Transformer models have achieved state-of-the-art results in neural machine translation and have become standard in the field. In this work, we look for the optimal combination of known techniques to optimize inference speed without sacrificing translation quality. We conduct an empirical study that stacks various approaches and demonstrates that combination of replacing decoder self-attention with simplified recurrent units, adopting a deep encoder and a shallow decoder architecture and multi-head attention pruning can achieve up to 109% and 84% speedup on CPU and GPU respectively and reduce the number of parameters by 25% while maintaining the same translation quality in terms of BLEU.

* Accepted SustaiNLP 2020

Via

Access Paper or Ask Questions

Towards Structured Deep Neural Network for Automatic Speech Recognition

Nov 08, 2015

Yi-Hsiu Liao, Hung-yi Lee, Lin-shan Lee

Figure 1 for Towards Structured Deep Neural Network for Automatic Speech Recognition

Figure 2 for Towards Structured Deep Neural Network for Automatic Speech Recognition

Figure 3 for Towards Structured Deep Neural Network for Automatic Speech Recognition

Figure 4 for Towards Structured Deep Neural Network for Automatic Speech Recognition

Abstract:In this paper we propose the Structured Deep Neural Network (structured DNN) as a structured and deep learning framework. This approach can learn to find the best structured object (such as a label sequence) given a structured input (such as a vector sequence) by globally considering the mapping relationships between the structures rather than item by item. When automatic speech recognition is viewed as a special case of such a structured learning problem, where we have the acoustic vector sequence as the input and the phoneme label sequence as the output, it becomes possible to comprehensively learn utterance by utterance as a whole, rather than frame by frame. Structured Support Vector Machine (structured SVM) was proposed to perform ASR with structured learning previously, but limited by the linear nature of SVM. Here we propose structured DNN to use nonlinear transformations in multi-layers as a structured and deep learning approach. This approach was shown to beat structured SVM in preliminary experiments on TIMIT.

* arXiv admin note: text overlap with arXiv:1506.01163

Via

Access Paper or Ask Questions