Picture for Eng Siong Chng

Eng Siong Chng

Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities

Add code
Nov 29, 2024
Figure 1 for Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
Figure 2 for Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
Figure 3 for Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
Figure 4 for Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
Viaarxiv icon

Speech Separation using Neural Audio Codecs with Embedding Loss

Add code
Nov 27, 2024
Viaarxiv icon

NTU-NPU System for Voice Privacy 2024 Challenge

Add code
Oct 03, 2024
Figure 1 for NTU-NPU System for Voice Privacy 2024 Challenge
Figure 2 for NTU-NPU System for Voice Privacy 2024 Challenge
Figure 3 for NTU-NPU System for Voice Privacy 2024 Challenge
Figure 4 for NTU-NPU System for Voice Privacy 2024 Challenge
Viaarxiv icon

Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs

Add code
Sep 24, 2024
Viaarxiv icon

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition

Add code
Sep 17, 2024
Figure 1 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 2 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 3 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 4 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Viaarxiv icon

Continual Learning Optimizations for Auto-regressive Decoder of Multilingual ASR systems

Add code
Jul 04, 2024
Viaarxiv icon

Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization

Add code
Jul 02, 2024
Figure 1 for Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
Figure 2 for Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
Figure 3 for Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
Figure 4 for Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
Viaarxiv icon

Temporal-Channel Modeling in Multi-head Self-Attention for Synthetic Speech Detection

Add code
Jun 25, 2024
Viaarxiv icon

Towards Audio Codec-based Speech Separation

Add code
Jun 18, 2024
Viaarxiv icon

Dataset-Distillation Generative Model for Speech Emotion Recognition

Add code
Jun 05, 2024
Viaarxiv icon