Picture for Yang Ai

Yang Ai

Beyond WER: A Paired Acoustic Stress Test for Ambient Clinical Scribes

Add code
Jun 04, 2026
Viaarxiv icon

An Ultra-Low-Bitrate Neural Speech Codec with Plain-to-Pseudo Synergistic Vector Quantization

Add code
Jun 04, 2026
Viaarxiv icon

VoCodec: A Low-bitrate Streamable Neural Speech Codec with Voicing-driven Quantization

Add code
Jun 04, 2026
Viaarxiv icon

CFMDCTCodec: A Low-Bitrate Neural Speech Codec with Noise-Prior-aware Conditional Flow Matching for MDCT-Spectral Enhancement

Add code
May 26, 2026
Viaarxiv icon

Ultra-Low-Bitrate Mel-Spectrogram-based Neural Speech Coding with Flow-Matching-based Refinement and Vocoding-driven Reconstruction

Add code
May 25, 2026
Viaarxiv icon

CodeSep: Low-Bitrate Codec-Driven Speech Separation with Base-Token Disentanglement and Auxiliary-Token Serial Prediction

Add code
Jan 19, 2026
Viaarxiv icon

DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis

Add code
Sep 18, 2025
Figure 1 for DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis
Figure 2 for DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis
Figure 3 for DAIEN-TTS: Disentangled Audio Infilling for Environment-Aware Text-to-Speech Synthesis
Viaarxiv icon

Say More with Less: Variable-Frame-Rate Speech Tokenization via Adaptive Clustering and Implicit Duration Coding

Add code
Sep 04, 2025
Viaarxiv icon

Is GAN Necessary for Mel-Spectrogram-based Neural Vocoder?

Add code
Aug 11, 2025
Viaarxiv icon

Vision-Integrated High-Quality Neural Speech Coding

Add code
May 29, 2025
Figure 1 for Vision-Integrated High-Quality Neural Speech Coding
Figure 2 for Vision-Integrated High-Quality Neural Speech Coding
Figure 3 for Vision-Integrated High-Quality Neural Speech Coding
Figure 4 for Vision-Integrated High-Quality Neural Speech Coding
Viaarxiv icon