Picture for Yukiya Hono

Yukiya Hono

PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems

Add code
Jun 18, 2024
Figure 1 for PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems
Figure 2 for PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems
Figure 3 for PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems
Figure 4 for PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems
Viaarxiv icon

Release of Pre-Trained Models for the Japanese Language

Add code
Apr 02, 2024
Figure 1 for Release of Pre-Trained Models for the Japanese Language
Figure 2 for Release of Pre-Trained Models for the Japanese Language
Figure 3 for Release of Pre-Trained Models for the Japanese Language
Figure 4 for Release of Pre-Trained Models for the Japanese Language
Viaarxiv icon

PeriodGrad: Towards Pitch-Controllable Neural Vocoder Based on a Diffusion Probabilistic Model

Add code
Feb 22, 2024
Viaarxiv icon

An Integration of Pre-Trained Speech and Language Models for End-to-End Speech Recognition

Add code
Dec 06, 2023
Viaarxiv icon

Towards human-like spoken dialogue generation between AI agents from written dialogue

Add code
Oct 02, 2023
Viaarxiv icon

UniFLG: Unified Facial Landmark Generator from Text or Speech

Add code
Feb 28, 2023
Viaarxiv icon

Singing voice synthesis based on frame-level sequence-to-sequence models considering vocal timing deviation

Add code
Jan 05, 2023
Viaarxiv icon

Singing Voice Synthesis Based on a Musical Note Position-Aware Attention Mechanism

Add code
Dec 28, 2022
Viaarxiv icon

Embedding a Differentiable Mel-cepstral Synthesis Filter to a Neural Speech Synthesis System

Add code
Nov 21, 2022
Viaarxiv icon

End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue

Add code
Jun 24, 2022
Figure 1 for End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue
Figure 2 for End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue
Figure 3 for End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue
Figure 4 for End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue
Viaarxiv icon