Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Julien Diard

Repeat after me: Self-supervised learning of acoustic-to-articulatory mapping by vocal imitation

Apr 05, 2022

Marc-Antoine Georges, Julien Diard, Laurent Girin, Jean-Luc Schwartz, Thomas Hueber

Figure 1 for Repeat after me: Self-supervised learning of acoustic-to-articulatory mapping by vocal imitation

Figure 2 for Repeat after me: Self-supervised learning of acoustic-to-articulatory mapping by vocal imitation

Figure 3 for Repeat after me: Self-supervised learning of acoustic-to-articulatory mapping by vocal imitation

Figure 4 for Repeat after me: Self-supervised learning of acoustic-to-articulatory mapping by vocal imitation

Abstract:We propose a computational model of speech production combining a pre-trained neural articulatory synthesizer able to reproduce complex speech stimuli from a limited set of interpretable articulatory parameters, a DNN-based internal forward model predicting the sensory consequences of articulatory commands, and an internal inverse model based on a recurrent neural network recovering articulatory commands from the acoustic speech input. Both forward and inverse models are jointly trained in a self-supervised way from raw acoustic-only speech data from different speakers. The imitation simulations are evaluated objectively and subjectively and display quite encouraging performances.

Via

Access Paper or Ask Questions

Dynamical Variational Autoencoders: A Comprehensive Review

Aug 28, 2020

Laurent Girin, Simon Leglaive, Xiaoyu Bie, Julien Diard, Thomas Hueber, Xavier Alameda-Pineda

Figure 1 for Dynamical Variational Autoencoders: A Comprehensive Review

Figure 2 for Dynamical Variational Autoencoders: A Comprehensive Review

Figure 3 for Dynamical Variational Autoencoders: A Comprehensive Review

Figure 4 for Dynamical Variational Autoencoders: A Comprehensive Review

Abstract:The Variational Autoencoder (VAE) is a powerful deep generative model that is now extensively used to represent high-dimensional complex data via a low-dimensional latent space that is learned in an unsupervised manner. In the original VAE model, input data vectors are processed independently. In the recent years, a series of papers have presented different extensions of the VAE to sequential data, that not only model the latent space, but also model the temporal dependencies within a sequence of data vectors and/or corresponding latent vectors, relying on recurrent neural networks or state space models. In this paper we perform an extensive literature review of these models. Importantly, we introduce and discuss a general class of models called Dynamical Variational Autoencoders (DVAEs) that encompass a large subset of these temporal VAE extensions. Then we present in details seven different instances of DVAE that were recently proposed in the literature, with an effort to homogenize the notations and presentation lines, as well as to relate those models with existing classical temporal models (that are also presented for the sake of completeness). We reimplemented those seven DVAE models and we present the results of an experimental benchmark that we conducted on the speech analysis-resynthesis task (the PyTorch code will be made publicly available). An extensive discussion is presented at the end of the paper, aiming to comment on important issues concerning the DVAE class of models and to describe future research guidelines.

Via

Access Paper or Ask Questions