Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment

Feb 10, 2025

Kwanghee Choi, Eunjung Yeo, Kalvin Chang, Shinji Watanabe, David Mortensen

Figure 1 for Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment

Figure 2 for Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment

Figure 3 for Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment

Figure 4 for Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment

Share this with someone who'll enjoy it:

Abstract:Allophony refers to the variation in the phonetic realization of a phoneme based on its phonetic environment. Modeling allophones is crucial for atypical pronunciation assessment, which involves distinguishing atypical from typical pronunciations. However, recent phoneme classifier-based approaches often simplify this by treating various realizations as a single phoneme, bypassing the complexity of modeling allophonic variation. Motivated by the acoustic modeling capabilities of frozen self-supervised speech model (S3M) features, we propose MixGoP, a novel approach that leverages Gaussian mixture models to model phoneme distributions with multiple subclusters. Our experiments show that MixGoP achieves state-of-the-art performance across four out of five datasets, including dysarthric and non-native speech. Our analysis further suggests that S3M features capture allophonic variation more effectively than MFCCs and Mel spectrograms, highlighting the benefits of integrating MixGoP with S3M features.

* Accepted to NAACL 2025. Codebase available at https://github.com/juice500ml/acoustic-units-for-ood

View paper on

Share this with someone who'll enjoy it:

Title:Leveraging Allophony in Self-Supervised Speech Models for Atypical Pronunciation Assessment

Paper and Code