Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Text-Aware End-to-end Mispronunciation Detection and Diagnosis

Jun 15, 2022

Linkai Peng, Yingming Gao, Binghuai Lin, Dengfeng Ke, Yanlu Xie, Jinsong Zhang

Figure 1 for Text-Aware End-to-end Mispronunciation Detection and Diagnosis

Figure 2 for Text-Aware End-to-end Mispronunciation Detection and Diagnosis

Figure 3 for Text-Aware End-to-end Mispronunciation Detection and Diagnosis

Figure 4 for Text-Aware End-to-end Mispronunciation Detection and Diagnosis

Share this with someone who'll enjoy it:

Abstract:Mispronunciation detection and diagnosis (MDD) technology is a key component of computer-assisted pronunciation training system (CAPT). In the field of assessing the pronunciation quality of constrained speech, the given transcriptions can play the role of a teacher. Conventional methods have fully utilized the prior texts for the model construction or improving the system performance, e.g. forced-alignment and extended recognition networks. Recently, some end-to-end based methods attempt to incorporate the prior texts into model training and preliminarily show the effectiveness. However, previous studies mostly consider applying raw attention mechanism to fuse audio representations with text representations, without taking possible text-pronunciation mismatch into account. In this paper, we present a gating strategy that assigns more importance to the relevant audio features while suppressing irrelevant text information. Moreover, given the transcriptions, we design an extra contrastive loss to reduce the gap between the learning objective of phoneme recognition and MDD. We conducted experiments using two publicly available datasets (TIMIT and L2-Arctic) and our best model improved the F1 score from $57.51\%$ to $61.75\%$ compared to the baselines. Besides, we provide a detailed analysis to shed light on the effectiveness of gating mechanism and contrastive learning on MDD.

* Rejected by Interspeech2022

View paper on

Share this with someone who'll enjoy it:

Title:Text-Aware End-to-end Mispronunciation Detection and Diagnosis

Paper and Code