Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jonathan Sherman

EMA2S: An End-to-End Multimodal Articulatory-to-Speech System

Feb 07, 2021

Yu-Wen Chen, Kuo-Hsuan Hung, Shang-Yi Chuang, Jonathan Sherman, Wen-Chin Huang, Xugang Lu, Yu Tsao

Figure 1 for EMA2S: An End-to-End Multimodal Articulatory-to-Speech System

Figure 2 for EMA2S: An End-to-End Multimodal Articulatory-to-Speech System

Figure 3 for EMA2S: An End-to-End Multimodal Articulatory-to-Speech System

Figure 4 for EMA2S: An End-to-End Multimodal Articulatory-to-Speech System

Abstract:Synthesized speech from articulatory movements can have real-world use for patients with vocal cord disorders, situations requiring silent speech, or in high-noise environments. In this work, we present EMA2S, an end-to-end multimodal articulatory-to-speech system that directly converts articulatory movements to speech signals. We use a neural-network-based vocoder combined with multimodal joint-training, incorporating spectrogram, mel-spectrogram, and deep features. The experimental results confirm that the multimodal approach of EMA2S outperforms the baseline system in terms of both objective evaluation and subjective evaluation metrics. Moreover, results demonstrate that joint mel-spectrogram and deep feature loss training can effectively improve system performance.

Via

Access Paper or Ask Questions