Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Audrey Howard

Accent Conversion with Articulatory Representations

Jun 10, 2024

Yashish M. Siriwardena, Nathan Swedlow, Audrey Howard, Evan Gitterman, Dan Darcy, Carol Espy-Wilson, Andrea Fanelli

Figure 1 for Accent Conversion with Articulatory Representations

Figure 2 for Accent Conversion with Articulatory Representations

Figure 3 for Accent Conversion with Articulatory Representations

Figure 4 for Accent Conversion with Articulatory Representations

Abstract:Conversion of non-native accented speech to native (American) English has a wide range of applications such as improving intelligibility of non-native speech. Previous work on this domain has used phonetic posteriograms as the target speech representation to train an acoustic model which is then used to extract a compact representation of input speech for accent conversion. In this work, we introduce the idea of using an effective articulatory speech representation, extracted from an acoustic-to-articulatory speech inversion system, to improve the acoustic model used in accent conversion. The idea to incorporate articulatory representations originates from their ability to well characterize accents in speech. To incorporate articulatory representations with conventional phonetic posteriograms, a multi-task learning based acoustic model is proposed. Objective and subjective evaluations show that the use of articulatory representations can improve the effectiveness of accent conversion.

* Accepted at INTERSPEECH 2024

Via

Access Paper or Ask Questions

DeepSpace: Dynamic Spatial and Source Cue Based Source Separation for Dialog Enhancement

Feb 22, 2023

Aaron Master, Lie Lu, Jonas Samuelsson, Heidi-Maria Lehtonen, Scott Norcross, Nathan Swedlow, Audrey Howard

Abstract:Dialog Enhancement (DE) is a feature which allows a user to increase the level of dialog in TV or movie content relative to non-dialog sounds. When only the original mix is available, DE is "unguided," and requires source separation. In this paper, we describe the DeepSpace system, which performs source separation using both dynamic spatial cues and source cues to support unguided DE. Its technologies include spatio-level filtering (SLF) and deep-learning based dialog classification and denoising. Using subjective listening tests, we show that DeepSpace demonstrates significantly improved overall performance relative to state-of-the-art systems available for testing. We explore the feasibility of using existing automated metrics to evaluate unguided DE systems.

* 5 pages, 4 figures. To be published in ICASSP 2023

Via

Access Paper or Ask Questions