Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information

Dec 07, 2022

Fenglin Ding, Genshun Wan, Pengcheng Li, Jia Pan, Cong Liu

Figure 1 for Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information

Figure 2 for Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information

Figure 3 for Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information

Figure 4 for Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information

Share this with someone who'll enjoy it:

Abstract:Multilingual end-to-end models have shown great improvement over monolingual systems. With the development of pre-training methods on speech, self-supervised multilingual speech representation learning like XLSR has shown success in improving the performance of multilingual automatic speech recognition (ASR). However, similar to the supervised learning, multilingual pre-training may also suffer from language interference and further affect the application of multilingual system. In this paper, we introduce several techniques for improving self-supervised multilingual pre-training by leveraging auxiliary language information, including the language adversarial training, language embedding and language adaptive training during the pre-training stage. We conduct experiments on a multilingual ASR task consisting of 16 languages. Our experimental results demonstrate 14.3% relative gain over the standard XLSR model, and 19.8% relative gain over the no pre-training multilingual model.

* Subimitted to ICASSP 2023

View paper on

Share this with someone who'll enjoy it:

Title:Improved Self-Supervised Multilingual Speech Representation Learning Combined with Auxiliary Language Information

Paper and Code