Picture for Haizhou Li

Haizhou Li

Soundwave: Less is More for Speech-Text Alignment in LLMs

Add code
Feb 18, 2025
Viaarxiv icon

Should Audio Front-ends be Adaptive? Comparing Learnable and Adaptive Front-ends

Add code
Feb 05, 2025
Viaarxiv icon

PAL: Prompting Analytic Learning with Missing Modality for Multi-Modal Class-Incremental Learning

Add code
Jan 16, 2025
Viaarxiv icon

ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification

Add code
Jan 14, 2025
Figure 1 for ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification
Figure 2 for ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification
Figure 3 for ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification
Figure 4 for ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification
Viaarxiv icon

Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis

Add code
Jan 11, 2025
Viaarxiv icon

Binary Event-Driven Spiking Transformer

Add code
Jan 10, 2025
Viaarxiv icon

Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition

Add code
Jan 03, 2025
Figure 1 for Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Figure 2 for Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Figure 3 for Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Figure 4 for Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Viaarxiv icon

SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor

Add code
Dec 18, 2024
Figure 1 for SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Figure 2 for SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Figure 3 for SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Figure 4 for SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Viaarxiv icon

Hierarchical Control of Emotion Rendering in Speech Synthesis

Add code
Dec 17, 2024
Figure 1 for Hierarchical Control of Emotion Rendering in Speech Synthesis
Figure 2 for Hierarchical Control of Emotion Rendering in Speech Synthesis
Figure 3 for Hierarchical Control of Emotion Rendering in Speech Synthesis
Figure 4 for Hierarchical Control of Emotion Rendering in Speech Synthesis
Viaarxiv icon

Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech

Add code
Dec 17, 2024
Viaarxiv icon