Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael Horgan

An empirical study of Conv-TasNet

Feb 24, 2020

Berkan Kadioglu, Michael Horgan, Xiaoyu Liu, Jordi Pons, Dan Darcy, Vivek Kumar

Figure 1 for An empirical study of Conv-TasNet

Figure 2 for An empirical study of Conv-TasNet

Figure 3 for An empirical study of Conv-TasNet

Figure 4 for An empirical study of Conv-TasNet

Abstract:Conv-TasNet is a recently proposed waveform-based deep neural network that achieves state-of-the-art performance in speech source separation. Its architecture consists of a learnable encoder/decoder and a separator that operates on top of this learned space. Various improvements have been proposed to Conv-TasNet. However, they mostly focus on the separator, leaving its encoder/decoder as a (shallow) linear operator. In this paper, we conduct an empirical study of Conv-TasNet and propose an enhancement to the encoder/decoder that is based on a (deep) non-linear variant of it. In addition, we experiment with the larger and more diverse LibriTTS dataset and investigate the generalization capabilities of the studied models when trained on a much larger dataset. We propose cross-dataset evaluation that includes assessing separations from the WSJ0-2mix, LibriTTS and VCTK databases. Our results show that enhancements to the encoder/decoder can improve average SI-SNR performance by more than 1 dB. Furthermore, we offer insights into the generalization capabilities of Conv-TasNet and the potential value of improvements to the encoder/decoder.

* In proceedings of ICASSP2020

Via

Access Paper or Ask Questions

Voice Conversion with Conditional SampleRNN

Aug 24, 2018

Cong Zhou, Michael Horgan, Vivek Kumar, Cristina Vasco, Dan Darcy

Figure 1 for Voice Conversion with Conditional SampleRNN

Figure 2 for Voice Conversion with Conditional SampleRNN

Figure 3 for Voice Conversion with Conditional SampleRNN

Figure 4 for Voice Conversion with Conditional SampleRNN

Abstract:Here we present a novel approach to conditioning the SampleRNN generative model for voice conversion (VC). Conventional methods for VC modify the perceived speaker identity by converting between source and target acoustic features. Our approach focuses on preserving voice content and depends on the generative network to learn voice style. We first train a multi-speaker SampleRNN model conditioned on linguistic features, pitch contour, and speaker identity using a multi-speaker speech corpus. Voice-converted speech is generated using linguistic features and pitch contour extracted from the source speaker, and the target speaker identity. We demonstrate that our system is capable of many-to-many voice conversion without requiring parallel data, enabling broad applications. Subjective evaluation demonstrates that our approach outperforms conventional VC methods.

* Accepted at Interspeech 2018, Hyderabad, India. This version matches the final version submitted to the conference

Via

Access Paper or Ask Questions