Picture for Haizhou Li

Haizhou Li

Transferable Adversarial Attacks against ASR

Add code
Nov 14, 2024
Viaarxiv icon

SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model

Add code
Nov 12, 2024
Viaarxiv icon

Speech Separation with Pretrained Frontend to Minimize Domain Mismatch

Add code
Nov 05, 2024
Viaarxiv icon

VLMimic: Vision Language Models are Visual Imitation Learner for Fine-grained Actions

Add code
Oct 29, 2024
Viaarxiv icon

VoiceBench: Benchmarking LLM-Based Voice Assistants

Add code
Oct 22, 2024
Viaarxiv icon

Multi-Level Speaker Representation for Target Speaker Extraction

Add code
Oct 21, 2024
Viaarxiv icon

Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement

Add code
Oct 18, 2024
Viaarxiv icon

Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech

Add code
Oct 18, 2024
Viaarxiv icon

Roadmap towards Superhuman Speech Understanding using Large Language Models

Add code
Oct 17, 2024
Figure 1 for Roadmap towards Superhuman Speech Understanding using Large Language Models
Figure 2 for Roadmap towards Superhuman Speech Understanding using Large Language Models
Figure 3 for Roadmap towards Superhuman Speech Understanding using Large Language Models
Figure 4 for Roadmap towards Superhuman Speech Understanding using Large Language Models
Viaarxiv icon

Emphasis Rendering for Conversational Text-to-Speech with Multi-modal Multi-scale Context Modeling

Add code
Oct 12, 2024
Viaarxiv icon