Picture for Yao Qian

Yao Qian

Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages

Add code
Nov 11, 2024
Figure 1 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Figure 2 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Figure 3 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Figure 4 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Viaarxiv icon

Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation

Add code
Sep 06, 2024
Figure 1 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 2 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 3 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 4 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Viaarxiv icon

VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers

Add code
Jun 08, 2024
Viaarxiv icon

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

Add code
May 28, 2024
Viaarxiv icon

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

Add code
Apr 10, 2024
Viaarxiv icon

Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction

Add code
Sep 25, 2023
Viaarxiv icon

Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition

Add code
Aug 03, 2023
Figure 1 for Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition
Figure 2 for Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition
Figure 3 for Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition
Figure 4 for Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition
Viaarxiv icon

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers

Add code
May 30, 2023
Viaarxiv icon

ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

Add code
May 24, 2023
Viaarxiv icon

i-Code Studio: A Configurable and Composable Framework for Integrative AI

Add code
May 23, 2023
Viaarxiv icon