Picture for Dongmei Wang

Dongmei Wang

Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages

Add code
Nov 11, 2024
Figure 1 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Figure 2 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Figure 3 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Figure 4 for Isochrony-Controlled Speech-to-Text Translation: A study on translating from Sino-Tibetan to Indo-European Languages
Viaarxiv icon

Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation

Add code
Sep 06, 2024
Figure 1 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 2 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 3 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 4 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Viaarxiv icon

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

Add code
May 28, 2024
Viaarxiv icon

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

Add code
Apr 10, 2024
Viaarxiv icon

Profile-Error-Tolerant Target-Speaker Voice Activity Detection

Add code
Sep 21, 2023
Viaarxiv icon

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers

Add code
May 30, 2023
Viaarxiv icon

Target Sound Extraction with Variable Cross-modality Clues

Add code
Mar 15, 2023
Viaarxiv icon

Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization

Add code
Aug 27, 2022
Figure 1 for Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Figure 2 for Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Figure 3 for Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Figure 4 for Target Speaker Voice Activity Detection with Transformers and Its Integration with End-to-End Neural Diarization
Viaarxiv icon

Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation

Add code
Apr 07, 2022
Figure 1 for Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation
Figure 2 for Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation
Figure 3 for Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation
Figure 4 for Leveraging Real Conversational Data for Multi-Channel Continuous Speech Separation
Viaarxiv icon

PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays

Add code
Jan 24, 2022
Figure 1 for PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays
Figure 2 for PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays
Figure 3 for PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays
Figure 4 for PickNet: Real-Time Channel Selection for Ad Hoc Microphone Arrays
Viaarxiv icon