Picture for Eng Siong Chng

Eng Siong Chng

Continual Learning with Embedding Layer Surgery and Task-wise Beam Search using Whisper

Add code
Jan 14, 2025
Viaarxiv icon

Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model

Add code
Jan 13, 2025
Viaarxiv icon

An Investigation on the Potential of KAN in Speech Enhancement

Add code
Dec 23, 2024
Viaarxiv icon

Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities

Add code
Nov 29, 2024
Figure 1 for Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
Figure 2 for Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
Figure 3 for Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
Figure 4 for Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
Viaarxiv icon

Speech Separation using Neural Audio Codecs with Embedding Loss

Add code
Nov 27, 2024
Figure 1 for Speech Separation using Neural Audio Codecs with Embedding Loss
Figure 2 for Speech Separation using Neural Audio Codecs with Embedding Loss
Figure 3 for Speech Separation using Neural Audio Codecs with Embedding Loss
Figure 4 for Speech Separation using Neural Audio Codecs with Embedding Loss
Viaarxiv icon

NTU-NPU System for Voice Privacy 2024 Challenge

Add code
Oct 03, 2024
Figure 1 for NTU-NPU System for Voice Privacy 2024 Challenge
Figure 2 for NTU-NPU System for Voice Privacy 2024 Challenge
Figure 3 for NTU-NPU System for Voice Privacy 2024 Challenge
Figure 4 for NTU-NPU System for Voice Privacy 2024 Challenge
Viaarxiv icon

Bridging Speech and Text: Enhancing ASR with Pinyin-to-Character Pre-training in LLMs

Add code
Sep 24, 2024
Viaarxiv icon

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition

Add code
Sep 17, 2024
Figure 1 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 2 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 3 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Figure 4 for Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Viaarxiv icon

Continual Learning Optimizations for Auto-regressive Decoder of Multilingual ASR systems

Add code
Jul 04, 2024
Viaarxiv icon

Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization

Add code
Jul 02, 2024
Figure 1 for Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
Figure 2 for Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
Figure 3 for Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
Figure 4 for Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
Viaarxiv icon