Picture for Shengkui Zhao

Shengkui Zhao

LuSeeL: Language-queried Binaural Universal Sound Event Extraction and Localization

Add code
Jan 27, 2026
Viaarxiv icon

Beyond Lips: Integrating Gesture and Lip Cues for Robust Audio-visual Speaker Extraction

Add code
Jan 27, 2026
Viaarxiv icon

FlowSE-GRPO: Training Flow Matching Speech Enhancement via Online Reinforcement Learning

Add code
Jan 23, 2026
Viaarxiv icon

E2E-AEC: Implementing an end-to-end neural network learning approach for acoustic echo cancellation

Add code
Jan 23, 2026
Viaarxiv icon

FunAudio-ASR Technical Report

Add code
Sep 15, 2025
Figure 1 for FunAudio-ASR Technical Report
Figure 2 for FunAudio-ASR Technical Report
Figure 3 for FunAudio-ASR Technical Report
Figure 4 for FunAudio-ASR Technical Report
Viaarxiv icon

ClearerVoice-Studio: Bridging Advanced Speech Processing Research and Practical Deployment

Add code
Jun 24, 2025
Viaarxiv icon

Plug-and-Play Co-Occurring Face Attention for Robust Audio-Visual Speaker Extraction

Add code
May 27, 2025
Viaarxiv icon

Conditional Latent Diffusion-Based Speech Enhancement Via Dual Context Learning

Add code
Jan 17, 2025
Viaarxiv icon

HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution

Add code
Jan 17, 2025
Figure 1 for HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution
Figure 2 for HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution
Figure 3 for HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution
Figure 4 for HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution
Viaarxiv icon

Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions

Add code
Sep 25, 2024
Viaarxiv icon