Picture for Naoyuki Kanda

Naoyuki Kanda

Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation

Add code
Feb 04, 2025
Figure 1 for Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
Figure 2 for Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
Figure 3 for Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
Figure 4 for Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
Viaarxiv icon

Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings

Add code
Jan 28, 2025
Figure 1 for Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings
Figure 2 for Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings
Figure 3 for Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings
Figure 4 for Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings
Viaarxiv icon

TS3-Codec: Transformer-Based Simple Streaming Single Codec

Add code
Nov 27, 2024
Figure 1 for TS3-Codec: Transformer-Based Simple Streaming Single Codec
Figure 2 for TS3-Codec: Transformer-Based Simple Streaming Single Codec
Figure 3 for TS3-Codec: Transformer-Based Simple Streaming Single Codec
Figure 4 for TS3-Codec: Transformer-Based Simple Streaming Single Codec
Viaarxiv icon

Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech

Add code
Jul 17, 2024
Figure 1 for Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Figure 2 for Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Figure 3 for Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Figure 4 for Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Viaarxiv icon

E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS

Add code
Jun 26, 2024
Figure 1 for E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Figure 2 for E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Figure 3 for E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Figure 4 for E2 TTS: Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS
Viaarxiv icon

An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS

Add code
Jun 09, 2024
Figure 1 for An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS
Figure 2 for An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS
Figure 3 for An Investigation of Noise Robustness for Flow-Matching-Based Zero-Shot TTS
Viaarxiv icon

Total-Duration-Aware Duration Modeling for Text-to-Speech Systems

Add code
Jun 06, 2024
Figure 1 for Total-Duration-Aware Duration Modeling for Text-to-Speech Systems
Figure 2 for Total-Duration-Aware Duration Modeling for Text-to-Speech Systems
Figure 3 for Total-Duration-Aware Duration Modeling for Text-to-Speech Systems
Figure 4 for Total-Duration-Aware Duration Modeling for Text-to-Speech Systems
Viaarxiv icon

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

Add code
Feb 12, 2024
Viaarxiv icon

NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

Add code
Jan 16, 2024
Figure 1 for NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription
Figure 2 for NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription
Viaarxiv icon

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

Add code
Oct 23, 2023
Figure 1 for Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
Figure 2 for Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
Figure 3 for Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation
Viaarxiv icon