Picture for Shengkui Zhao

Shengkui Zhao

Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions

Add code
Sep 25, 2024
Viaarxiv icon

Towards Audio Codec-based Speech Separation

Add code
Jun 18, 2024
Viaarxiv icon

Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis

Add code
Jun 04, 2024
Viaarxiv icon

MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation

Add code
Dec 19, 2023
Viaarxiv icon

SPGM: Prioritizing Local Features for enhanced speech separation performance

Add code
Sep 22, 2023
Viaarxiv icon

Are Soft Prompts Good Zero-shot Learners for Speech Recognition?

Add code
Sep 18, 2023
Viaarxiv icon

ACA-Net: Towards Lightweight Speaker Verification using Asymmetric Cross Attention

Add code
May 20, 2023
Viaarxiv icon

D2Former: A Fully Complex Dual-Path Dual-Decoder Conformer Network using Joint Complex Masking and Complex Spectral Mapping for Monaural Speech Enhancement

Add code
Feb 23, 2023
Viaarxiv icon

MossFormer: Pushing the Performance Limit of Monaural Speech Separation using Gated Single-Head Transformer with Convolution-Augmented Joint Self-Attentions

Add code
Feb 23, 2023
Viaarxiv icon

FRCRN: Boosting Feature Representation using Frequency Recurrence for Monaural Speech Enhancement

Add code
Jun 15, 2022
Figure 1 for FRCRN: Boosting Feature Representation using Frequency Recurrence for Monaural Speech Enhancement
Figure 2 for FRCRN: Boosting Feature Representation using Frequency Recurrence for Monaural Speech Enhancement
Figure 3 for FRCRN: Boosting Feature Representation using Frequency Recurrence for Monaural Speech Enhancement
Figure 4 for FRCRN: Boosting Feature Representation using Frequency Recurrence for Monaural Speech Enhancement
Viaarxiv icon