Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Dual-Decoder Transformer For end-to-end Mandarin Chinese Speech Recognition with Pinyin and Character

Jan 26, 2022

Zhao Yang, Wei Xi, Rui Wang, Rui Jiang, Jizhong Zhao

Figure 1 for Dual-Decoder Transformer For end-to-end Mandarin Chinese Speech Recognition with Pinyin and Character

Figure 2 for Dual-Decoder Transformer For end-to-end Mandarin Chinese Speech Recognition with Pinyin and Character

Figure 3 for Dual-Decoder Transformer For end-to-end Mandarin Chinese Speech Recognition with Pinyin and Character

Figure 4 for Dual-Decoder Transformer For end-to-end Mandarin Chinese Speech Recognition with Pinyin and Character

Share this with someone who'll enjoy it:

Abstract:End-to-end automatic speech recognition (ASR) has achieved promising results. However, most existing end-to-end ASR methods neglect the use of specific language characteristics. For Mandarin Chinese ASR tasks, pinyin and character as writing and spelling systems respectively are mutual promotion in the Mandarin Chinese language. Based on the above intuition, we investigate types of related models that are suitable but not for joint pinyin-character ASR and propose a novel Mandarin Chinese ASR model with dual-decoder Transformer according to the characteristics of the pinyin transcripts and character transcripts. Specifically, the joint pinyin-character layer-wise linear interactive (LWLI) module and phonetic posteriorgrams adapter (PPGA) are proposed to achieve inter-layer multi-level interaction by adaptively fusing pinyin and character information. Furthermore, a two-stage training strategy is proposed to make training more stable and faster convergence. The results on the test sets of AISHELL-1 dataset show that the proposed Speech-Pinyin-Character-Interaction (SPCI) model without a language model achieves 9.85% character error rate (CER) on the test set, which is 17.71% relative reduction compared to baseline models based on Transformer.

View paper on

Share this with someone who'll enjoy it:

Title:Dual-Decoder Transformer For end-to-end Mandarin Chinese Speech Recognition with Pinyin and Character

Paper and Code