Picture for Yujun Wang

Yujun Wang

Optimizing Dysarthria Wake-Up Word Spotting: An End-to-End Approach for SLT 2024 LRDWWS Challenge

Add code
Sep 16, 2024
Viaarxiv icon

Efficient Extraction of Noise-Robust Discrete Units from Self-Supervised Speech Models

Add code
Sep 04, 2024
Figure 1 for Efficient Extraction of Noise-Robust Discrete Units from Self-Supervised Speech Models
Figure 2 for Efficient Extraction of Noise-Robust Discrete Units from Self-Supervised Speech Models
Figure 3 for Efficient Extraction of Noise-Robust Discrete Units from Self-Supervised Speech Models
Figure 4 for Efficient Extraction of Noise-Robust Discrete Units from Self-Supervised Speech Models
Viaarxiv icon

Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding

Add code
Jun 19, 2024
Viaarxiv icon

Bridging Language Gaps in Audio-Text Retrieval

Add code
Jun 11, 2024
Figure 1 for Bridging Language Gaps in Audio-Text Retrieval
Figure 2 for Bridging Language Gaps in Audio-Text Retrieval
Figure 3 for Bridging Language Gaps in Audio-Text Retrieval
Figure 4 for Bridging Language Gaps in Audio-Text Retrieval
Viaarxiv icon

Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling

Add code
Jun 11, 2024
Viaarxiv icon

Scaling up masked audio encoder learning for general audio classification

Add code
Jun 11, 2024
Figure 1 for Scaling up masked audio encoder learning for general audio classification
Figure 2 for Scaling up masked audio encoder learning for general audio classification
Figure 3 for Scaling up masked audio encoder learning for general audio classification
Figure 4 for Scaling up masked audio encoder learning for general audio classification
Viaarxiv icon

CED: Consistent ensemble distillation for audio tagging

Add code
Sep 08, 2023
Figure 1 for CED: Consistent ensemble distillation for audio tagging
Figure 2 for CED: Consistent ensemble distillation for audio tagging
Figure 3 for CED: Consistent ensemble distillation for audio tagging
Figure 4 for CED: Consistent ensemble distillation for audio tagging
Viaarxiv icon

Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction

Add code
Jun 28, 2023
Viaarxiv icon

Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information

Add code
Jun 28, 2023
Viaarxiv icon

AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction

Add code
Jun 25, 2023
Figure 1 for AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction
Figure 2 for AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction
Figure 3 for AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction
Figure 4 for AV-SepFormer: Cross-Attention SepFormer for Audio-Visual Target Speaker Extraction
Viaarxiv icon