Picture for RJ Skerry-Ryan

RJ Skerry-Ryan

Long-Form Speech Generation with Spoken Language Models

Add code
Dec 24, 2024
Viaarxiv icon

Zero-Shot Mono-to-Binaural Speech Synthesis

Add code
Dec 11, 2024
Viaarxiv icon

Very Attentive Tacotron: Robust and Unbounded Length Generalization in Autoregressive Transformer-Based Text-to-Speech

Add code
Oct 29, 2024
Viaarxiv icon

LMs with a Voice: Spoken Language Modeling beyond Speech Tokens

Add code
May 24, 2023
Figure 1 for LMs with a Voice: Spoken Language Modeling beyond Speech Tokens
Figure 2 for LMs with a Voice: Spoken Language Modeling beyond Speech Tokens
Figure 3 for LMs with a Voice: Spoken Language Modeling beyond Speech Tokens
Figure 4 for LMs with a Voice: Spoken Language Modeling beyond Speech Tokens
Viaarxiv icon

Learning the joint distribution of two sequences using little or no paired data

Add code
Dec 06, 2022
Figure 1 for Learning the joint distribution of two sequences using little or no paired data
Figure 2 for Learning the joint distribution of two sequences using little or no paired data
Figure 3 for Learning the joint distribution of two sequences using little or no paired data
Figure 4 for Learning the joint distribution of two sequences using little or no paired data
Viaarxiv icon

Speaker Generation

Add code
Nov 07, 2021
Figure 1 for Speaker Generation
Figure 2 for Speaker Generation
Figure 3 for Speaker Generation
Figure 4 for Speaker Generation
Viaarxiv icon

Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling

Add code
Apr 13, 2021
Figure 1 for Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
Figure 2 for Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
Figure 3 for Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
Figure 4 for Parallel Tacotron 2: A Non-Autoregressive Neural TTS Model with Differentiable Duration Modeling
Viaarxiv icon

Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis

Add code
Nov 06, 2020
Figure 1 for Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Figure 2 for Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Figure 3 for Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Figure 4 for Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Viaarxiv icon

Non-saturating GAN training as divergence minimization

Add code
Oct 15, 2020
Figure 1 for Non-saturating GAN training as divergence minimization
Figure 2 for Non-saturating GAN training as divergence minimization
Figure 3 for Non-saturating GAN training as divergence minimization
Figure 4 for Non-saturating GAN training as divergence minimization
Viaarxiv icon

Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis

Add code
Oct 23, 2019
Figure 1 for Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
Figure 2 for Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
Figure 3 for Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
Figure 4 for Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
Viaarxiv icon