Picture for Xinyuan Qian

Xinyuan Qian

FaceSpeak: Expressive and High-Quality Speech Synthesis from Human Portraits of Different Styles

Add code
Jan 02, 2025
Viaarxiv icon

Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition

Add code
Jan 01, 2025
Figure 1 for Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition
Figure 2 for Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition
Figure 3 for Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition
Figure 4 for Breaking Through the Spike: Spike Window Decoding for Accelerated and Precise Automatic Speech Recognition
Viaarxiv icon

I2TTS: Image-indicated Immersive Text-to-speech Synthesis with Spatial Perception

Add code
Nov 20, 2024
Viaarxiv icon

SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model

Add code
Nov 12, 2024
Figure 1 for SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model
Figure 2 for SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model
Figure 3 for SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model
Figure 4 for SAV-SE: Scene-aware Audio-Visual Speech Enhancement with Selective State Space Model
Viaarxiv icon

Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection

Add code
Sep 11, 2024
Figure 1 for Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection
Figure 2 for Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection
Figure 3 for Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection
Figure 4 for Analytic Class Incremental Learning for Sound Source Localization with Privacy Protection
Viaarxiv icon

Text-Queried Target Sound Event Localization

Add code
Jun 23, 2024
Figure 1 for Text-Queried Target Sound Event Localization
Figure 2 for Text-Queried Target Sound Event Localization
Figure 3 for Text-Queried Target Sound Event Localization
Figure 4 for Text-Queried Target Sound Event Localization
Viaarxiv icon

An Exploration of Length Generalization in Transformer-Based Speech Enhancement

Add code
Jun 17, 2024
Figure 1 for An Exploration of Length Generalization in Transformer-Based Speech Enhancement
Figure 2 for An Exploration of Length Generalization in Transformer-Based Speech Enhancement
Figure 3 for An Exploration of Length Generalization in Transformer-Based Speech Enhancement
Figure 4 for An Exploration of Length Generalization in Transformer-Based Speech Enhancement
Viaarxiv icon

Mamba in Speech: Towards an Alternative to Self-Attention

Add code
May 22, 2024
Figure 1 for Mamba in Speech: Towards an Alternative to Self-Attention
Figure 2 for Mamba in Speech: Towards an Alternative to Self-Attention
Figure 3 for Mamba in Speech: Towards an Alternative to Self-Attention
Figure 4 for Mamba in Speech: Towards an Alternative to Self-Attention
Viaarxiv icon

Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention

Add code
Apr 29, 2024
Figure 1 for Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Figure 2 for Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Figure 3 for Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Figure 4 for Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Viaarxiv icon

Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training

Add code
Apr 01, 2024
Figure 1 for Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training
Figure 2 for Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training
Figure 3 for Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training
Figure 4 for Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training
Viaarxiv icon