Picture for Haizhou Li

Haizhou Li

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

Add code
Sep 11, 2025
Viaarxiv icon

NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation

Add code
Sep 04, 2025
Viaarxiv icon

Interpolating Speaker Identities in Embedding Space for Data Expansion

Add code
Aug 26, 2025
Viaarxiv icon

ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine

Add code
Aug 20, 2025
Viaarxiv icon

UniTalker: Conversational Speech-Visual Synthesis

Add code
Aug 06, 2025
Viaarxiv icon

Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data

Add code
Jul 23, 2025
Viaarxiv icon

IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing

Add code
Jul 10, 2025
Viaarxiv icon

VP-SelDoA: Visual-prompted Selective DoA Estimation of Target Sound via Semantic-Spatial Matching

Add code
Jul 10, 2025
Viaarxiv icon

SpeechRefiner: Towards Perceptual Quality Refinement for Front-End Algorithms

Add code
Jun 16, 2025
Viaarxiv icon

Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction

Add code
Jun 11, 2025
Viaarxiv icon