Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Siu Wa Lee

A Waveform Representation Framework for High-quality Statistical Parametric Speech Synthesis

Oct 06, 2015

Bo Fan, Siu Wa Lee, Xiaohai Tian, Lei Xie, Minghui Dong

Figure 1 for A Waveform Representation Framework for High-quality Statistical Parametric Speech Synthesis

Figure 2 for A Waveform Representation Framework for High-quality Statistical Parametric Speech Synthesis

Figure 3 for A Waveform Representation Framework for High-quality Statistical Parametric Speech Synthesis

Figure 4 for A Waveform Representation Framework for High-quality Statistical Parametric Speech Synthesis

Abstract:State-of-the-art statistical parametric speech synthesis (SPSS) generally uses a vocoder to represent speech signals and parameterize them into features for subsequent modeling. Magnitude spectrum has been a dominant feature over the years. Although perceptual studies have shown that phase spectrum is essential to the quality of synthesized speech, it is often ignored by using a minimum phase filter during synthesis and the speech quality suffers. To bypass this bottleneck in vocoded speech, this paper proposes a phase-embedded waveform representation framework and establishes a magnitude-phase joint modeling platform for high-quality SPSS. Our experiments on waveform reconstruction show that the performance is better than that of the widely-used STRAIGHT. Furthermore, the proposed modeling and synthesis platform outperforms a leading-edge, vocoded, deep bidirectional long short-term memory recurrent neural network (DBLSTM-RNN)-based baseline system in various objective evaluation metrics conducted.

* accepted and will appear in APSIPA2015; keywords: speech synthesis, LSTM-RNN, vocoder, phase, waveform, modeling

Via

Access Paper or Ask Questions