Picture for Haizhou Li

Haizhou Li

MoMuSE: Momentum Multi-modal Target Speaker Extraction for Real-time Scenarios with Impaired Visual Cues

Add code
Dec 11, 2024
Viaarxiv icon

Alignment at Pre-training! Towards Native Alignment for Arabic LLMs

Add code
Dec 04, 2024
Viaarxiv icon

Transferable Adversarial Attacks against ASR

Add code
Nov 14, 2024
Viaarxiv icon

SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model

Add code
Nov 12, 2024
Figure 1 for SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model
Figure 2 for SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model
Figure 3 for SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model
Figure 4 for SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model
Viaarxiv icon

Speech Separation with Pretrained Frontend to Minimize Domain Mismatch

Add code
Nov 05, 2024
Viaarxiv icon

VLMimic: Vision Language Models are Visual Imitation Learner for Fine-grained Actions

Add code
Oct 29, 2024
Viaarxiv icon

VoiceBench: Benchmarking LLM-Based Voice Assistants

Add code
Oct 22, 2024
Viaarxiv icon

Multi-Level Speaker Representation for Target Speaker Extraction

Add code
Oct 21, 2024
Viaarxiv icon

Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement

Add code
Oct 18, 2024
Viaarxiv icon

Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech

Add code
Oct 18, 2024
Viaarxiv icon