Picture for Xinyuan Qian

Xinyuan Qian

I2TTS: Image-indicated Immersive Text-to-speech Synthesis with Spatial Perception

Add code
Nov 20, 2024
Viaarxiv icon

SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model

Add code
Nov 12, 2024
Figure 1 for SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model
Figure 2 for SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model
Figure 3 for SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model
Figure 4 for SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model
Viaarxiv icon

Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection

Add code
Sep 11, 2024
Figure 1 for Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection
Figure 2 for Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection
Figure 3 for Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection
Figure 4 for Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection
Viaarxiv icon

Text-Queried Target Sound Event Localization

Add code
Jun 23, 2024
Figure 1 for Text-Queried Target Sound Event Localization
Figure 2 for Text-Queried Target Sound Event Localization
Figure 3 for Text-Queried Target Sound Event Localization
Figure 4 for Text-Queried Target Sound Event Localization
Viaarxiv icon

An Exploration of Length Generalization in Transformer-Based Speech Enhancement

Add code
Jun 17, 2024
Viaarxiv icon

Mamba in Speech: Towards an Alternative to Self-Attention

Add code
May 22, 2024
Figure 1 for Mamba in Speech: Towards an Alternative to Self-Attention
Figure 2 for Mamba in Speech: Towards an Alternative to Self-Attention
Figure 3 for Mamba in Speech: Towards an Alternative to Self-Attention
Figure 4 for Mamba in Speech: Towards an Alternative to Self-Attention
Viaarxiv icon

Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention

Add code
Apr 29, 2024
Figure 1 for Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Figure 2 for Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Figure 3 for Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Figure 4 for Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Viaarxiv icon

Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training

Add code
Apr 01, 2024
Figure 1 for Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training
Figure 2 for Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training
Figure 3 for Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training
Figure 4 for Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training
Viaarxiv icon

Audio-Visual Speaker Tracking: Progress, Challenges, and Future Directions

Add code
Oct 23, 2023
Viaarxiv icon

LocSelect: Target Speaker Localization with an Auditory Selective Hearing Mechanism

Add code
Oct 17, 2023
Viaarxiv icon