Picture for Haibin Wu

Haibin Wu

T-Mimi: A Transformer-based Mimi Decoder for Real-Time On-Phone TTS

Add code
Jan 27, 2026
Viaarxiv icon

Do Neural Codecs Generalize? A Controlled Study Across Unseen Languages and Non-Speech Tasks

Add code
Jan 18, 2026
Viaarxiv icon

How Does Instrumental Music Help SingFake Detection?

Add code
Sep 18, 2025
Viaarxiv icon

Discrete Audio Tokens: More Than a Survey!

Add code
Jun 12, 2025
Viaarxiv icon

Towards Generalized Source Tracing for Codec-Based Deepfake Speech

Add code
Jun 08, 2025
Viaarxiv icon

Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model

Add code
Jun 04, 2025
Figure 1 for Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Figure 2 for Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Figure 3 for Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Figure 4 for Towards Efficient Speech-Text Jointly Decoding within One Speech Language Model
Viaarxiv icon

Phi-Omni-ST: A multimodal language model for direct speech-to-speech translation

Add code
Jun 04, 2025
Viaarxiv icon

Codec-Based Deepfake Source Tracing via Neural Audio Codec Taxonomy

Add code
May 19, 2025
Figure 1 for Codec-Based Deepfake Source Tracing via Neural Audio Codec Taxonomy
Figure 2 for Codec-Based Deepfake Source Tracing via Neural Audio Codec Taxonomy
Figure 3 for Codec-Based Deepfake Source Tracing via Neural Audio Codec Taxonomy
Figure 4 for Codec-Based Deepfake Source Tracing via Neural Audio Codec Taxonomy
Viaarxiv icon

Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis

Add code
Apr 14, 2025
Figure 1 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Figure 2 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Figure 3 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Figure 4 for Pseudo-Autoregressive Neural Codec Language Models for Efficient Zero-Shot Text-to-Speech Synthesis
Viaarxiv icon

On The Landscape of Spoken Language Models: A Comprehensive Survey

Add code
Apr 11, 2025
Figure 1 for On The Landscape of Spoken Language Models: A Comprehensive Survey
Figure 2 for On The Landscape of Spoken Language Models: A Comprehensive Survey
Figure 3 for On The Landscape of Spoken Language Models: A Comprehensive Survey
Figure 4 for On The Landscape of Spoken Language Models: A Comprehensive Survey
Viaarxiv icon