Picture for Kentaro Mitsui

Kentaro Mitsui

PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems

Add code
Jun 18, 2024
Figure 1 for PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems
Figure 2 for PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems
Figure 3 for PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems
Figure 4 for PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems
Viaarxiv icon

Release of Pre-Trained Models for the Japanese Language

Add code
Apr 02, 2024
Figure 1 for Release of Pre-Trained Models for the Japanese Language
Figure 2 for Release of Pre-Trained Models for the Japanese Language
Figure 3 for Release of Pre-Trained Models for the Japanese Language
Figure 4 for Release of Pre-Trained Models for the Japanese Language
Viaarxiv icon

An Integration of Pre-Trained Speech and Language Models for End-to-End Speech Recognition

Add code
Dec 06, 2023
Viaarxiv icon

Towards human-like spoken dialogue generation between AI agents from written dialogue

Add code
Oct 02, 2023
Viaarxiv icon

UniFLG: Unified Facial Landmark Generator from Text or Speech

Add code
Feb 28, 2023
Viaarxiv icon

Text-Guided Scene Sketch-to-Photo Synthesis

Add code
Feb 14, 2023
Figure 1 for Text-Guided Scene Sketch-to-Photo Synthesis
Figure 2 for Text-Guided Scene Sketch-to-Photo Synthesis
Figure 3 for Text-Guided Scene Sketch-to-Photo Synthesis
Figure 4 for Text-Guided Scene Sketch-to-Photo Synthesis
Viaarxiv icon

End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue

Add code
Jun 24, 2022
Figure 1 for End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue
Figure 2 for End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue
Figure 3 for End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue
Figure 4 for End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue
Viaarxiv icon

MSR-NV: Neural vocoder using multiple sampling rates

Add code
Sep 28, 2021
Figure 1 for MSR-NV: Neural vocoder using multiple sampling rates
Figure 2 for MSR-NV: Neural vocoder using multiple sampling rates
Figure 3 for MSR-NV: Neural vocoder using multiple sampling rates
Figure 4 for MSR-NV: Neural vocoder using multiple sampling rates
Viaarxiv icon

Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes

Add code
Aug 07, 2020
Figure 1 for Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes
Figure 2 for Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes
Figure 3 for Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes
Figure 4 for Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes
Viaarxiv icon