Picture for Haizhou Li

Haizhou Li

CATCH: A Controllable Theme Detection Framework with Contextualized Clustering and Hierarchical Generation

Add code
Dec 25, 2025
Viaarxiv icon

ELEGANCE: Efficient LLM Guidance for Audio-Visual Target Speech Extraction

Add code
Nov 09, 2025
Viaarxiv icon

EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models

Add code
Oct 26, 2025
Viaarxiv icon

EchoX: Towards Mitigating Acoustic-Semantic Gap via Echo Training for Speech-to-Speech LLMs

Add code
Sep 11, 2025
Viaarxiv icon

NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation

Add code
Sep 04, 2025
Figure 1 for NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation
Figure 2 for NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation
Figure 3 for NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation
Figure 4 for NE-PADD: Leveraging Named Entity Knowledge for Robust Partial Audio Deepfake Detection via Attention Aggregation
Viaarxiv icon

Interpolating Speaker Identities in Embedding Space for Data Expansion

Add code
Aug 26, 2025
Figure 1 for Interpolating Speaker Identities in Embedding Space for Data Expansion
Figure 2 for Interpolating Speaker Identities in Embedding Space for Data Expansion
Figure 3 for Interpolating Speaker Identities in Embedding Space for Data Expansion
Figure 4 for Interpolating Speaker Identities in Embedding Space for Data Expansion
Viaarxiv icon

ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine

Add code
Aug 20, 2025
Figure 1 for ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine
Figure 2 for ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine
Figure 3 for ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine
Figure 4 for ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine
Viaarxiv icon

UniTalker: Conversational Speech-Visual Synthesis

Add code
Aug 06, 2025
Viaarxiv icon

Accent Normalization Using Self-Supervised Discrete Tokens with Non-Parallel Data

Add code
Jul 23, 2025
Viaarxiv icon

IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing

Add code
Jul 10, 2025
Figure 1 for IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing
Figure 2 for IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing
Figure 3 for IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing
Figure 4 for IML-Spikeformer: Input-aware Multi-Level Spiking Transformer for Speech Processing
Viaarxiv icon