Picture for Yujun Wang

Yujun Wang

Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering

Add code
Dec 16, 2024
Viaarxiv icon

Optimizing Dysarthria Wake-Up Word Spotting: An End-to-End Approach for SLT 2024 LRDWWS Challenge

Add code
Sep 16, 2024
Viaarxiv icon

Efficient Extraction of Noise-Robust Discrete Units from Self-Supervised Speech Models

Add code
Sep 04, 2024
Figure 1 for Efficient Extraction of Noise-Robust Discrete Units from Self-Supervised Speech Models
Figure 2 for Efficient Extraction of Noise-Robust Discrete Units from Self-Supervised Speech Models
Figure 3 for Efficient Extraction of Noise-Robust Discrete Units from Self-Supervised Speech Models
Figure 4 for Efficient Extraction of Noise-Robust Discrete Units from Self-Supervised Speech Models
Viaarxiv icon

Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding

Add code
Jun 19, 2024
Viaarxiv icon

Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling

Add code
Jun 11, 2024
Viaarxiv icon

Scaling up masked audio encoder learning for general audio classification

Add code
Jun 11, 2024
Figure 1 for Scaling up masked audio encoder learning for general audio classification
Figure 2 for Scaling up masked audio encoder learning for general audio classification
Figure 3 for Scaling up masked audio encoder learning for general audio classification
Figure 4 for Scaling up masked audio encoder learning for general audio classification
Viaarxiv icon

Bridging Language Gaps in Audio-Text Retrieval

Add code
Jun 11, 2024
Figure 1 for Bridging Language Gaps in Audio-Text Retrieval
Figure 2 for Bridging Language Gaps in Audio-Text Retrieval
Figure 3 for Bridging Language Gaps in Audio-Text Retrieval
Figure 4 for Bridging Language Gaps in Audio-Text Retrieval
Viaarxiv icon

CED: Consistent ensemble distillation for audio tagging

Add code
Sep 08, 2023
Figure 1 for CED: Consistent ensemble distillation for audio tagging
Figure 2 for CED: Consistent ensemble distillation for audio tagging
Figure 3 for CED: Consistent ensemble distillation for audio tagging
Figure 4 for CED: Consistent ensemble distillation for audio tagging
Viaarxiv icon

Enhanced Neural Beamformer with Spatial Information for Target Speech Extraction

Add code
Jun 28, 2023
Viaarxiv icon

Focus on the Sound around You: Monaural Target Speaker Extraction via Distance and Speaker Information

Add code
Jun 28, 2023
Viaarxiv icon