Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Szymon Zaporowski

Mispronunciation Detection in Non-native (L2) English with Uncertainty Modeling

Feb 08, 2021

Daniel Korzekwa, Jaime Lorenzo-Trueba, Szymon Zaporowski, Shira Calamaro, Thomas Drugman, Bozena Kostek

Figure 1 for Mispronunciation Detection in Non-native (L2) English with Uncertainty Modeling

Figure 2 for Mispronunciation Detection in Non-native (L2) English with Uncertainty Modeling

Figure 3 for Mispronunciation Detection in Non-native (L2) English with Uncertainty Modeling

Figure 4 for Mispronunciation Detection in Non-native (L2) English with Uncertainty Modeling

Abstract:A common approach to the automatic detection of mispronunciation in language learning is to recognize the phonemes produced by a student and compare it to the expected pronunciation of a native speaker. This approach makes two simplifying assumptions: a) phonemes can be recognized from speech with high accuracy, b) there is a single correct way for a sentence to be pronounced. These assumptions do not always hold, which can result in a significant amount of false mispronunciation alarms. We propose a novel approach to overcome this problem based on two principles: a) taking into account uncertainty in the automatic phoneme recognition step, b) accounting for the fact that there may be multiple valid pronunciations. We evaluate the model on non-native (L2) English speech of German, Italian and Polish speakers, where it is shown to increase the precision of detecting mispronunciations by up to 18% (relative) compared to the common approach.

* Accepted to ICASSP 2021

Via

Access Paper or Ask Questions

Detection of Lexical Stress Errors in Non-native English with Data Augmentation and Attention

Dec 29, 2020

Daniel Korzekwa, Roberto Barra-Chicote, Szymon Zaporowski, Grzegorz Beringer, Jaime Lorenzo-Trueba, Alicja Serafinowicz, Jasha Droppo, Thomas Drugman, Bozena Kostek

Figure 1 for Detection of Lexical Stress Errors in Non-native English with Data Augmentation and Attention

Figure 2 for Detection of Lexical Stress Errors in Non-native English with Data Augmentation and Attention

Figure 3 for Detection of Lexical Stress Errors in Non-native English with Data Augmentation and Attention

Figure 4 for Detection of Lexical Stress Errors in Non-native English with Data Augmentation and Attention

Abstract:This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS). In a classical approach, audio features are usually extracted from fixed regions of speech such as syllable nucleus. We propose an attention-based deep learning model that automatically derives optimal syllable-level representation from frame-level and phoneme-level audio features. Training this model is challenging because of the limited amount of incorrect stress patterns. To solve this problem, we propose to augment the training set with incorrectly stressed words generated with Neural TTS. Combining both techniques achieves 94.8\% precision and 49.2\% recall for the detection of incorrectly stressed words in L2 English speech of Slavic speakers.

* Submitted to ICASSP 2021

Via

Access Paper or Ask Questions