Picture for Haizhou Li

Haizhou Li

PAL: Prompting Analytic Learning with Missing Modality for Multi-Modal Class-Incremental Learning

Add code
Jan 16, 2025
Viaarxiv icon

ExPO: Explainable Phonetic Trait-Oriented Network for Speaker Verification

Add code
Jan 14, 2025
Viaarxiv icon

Retrieval-Augmented Dialogue Knowledge Aggregation for Expressive Conversational Speech Synthesis

Add code
Jan 11, 2025
Viaarxiv icon

Binary Event-Driven Spiking Transformer

Add code
Jan 10, 2025
Viaarxiv icon

Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition

Add code
Jan 03, 2025
Figure 1 for Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Figure 2 for Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Figure 3 for Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Figure 4 for Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Viaarxiv icon

SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor

Add code
Dec 18, 2024
Figure 1 for SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Figure 2 for SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Figure 3 for SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Figure 4 for SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Viaarxiv icon

Hierarchical Control of Emotion Rendering in Speech Synthesis

Add code
Dec 17, 2024
Figure 1 for Hierarchical Control of Emotion Rendering in Speech Synthesis
Figure 2 for Hierarchical Control of Emotion Rendering in Speech Synthesis
Figure 3 for Hierarchical Control of Emotion Rendering in Speech Synthesis
Figure 4 for Hierarchical Control of Emotion Rendering in Speech Synthesis
Viaarxiv icon

Multi-modal and Multi-scale Spatial Environment Understanding for Immersive Visual Text-to-Speech

Add code
Dec 17, 2024
Viaarxiv icon

Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion

Add code
Dec 16, 2024
Figure 1 for Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion
Figure 2 for Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion
Figure 3 for Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion
Figure 4 for Second Language (Arabic) Acquisition of LLMs via Progressive Vocabulary Expansion
Viaarxiv icon

MoMuSE: Momentum Multi-modal Target Speaker Extraction for Real-time Scenarios with Impaired Visual Cues

Add code
Dec 11, 2024
Viaarxiv icon