Picture for Yujun Wang

Yujun Wang

Beyond Accuracy: Community Perspectives on Machine Translation

Add code
Jun 08, 2026
Viaarxiv icon

EchoRL: Reinforcement Learning via Rollout Echoing

Add code
May 29, 2026
Viaarxiv icon

ASCD: Attention-Steerable Contrastive Decoding for Reducing Hallucination in MLLM

Add code
Jun 17, 2025
Viaarxiv icon

The ICME 2025 Audio Encoder Capability Challenge

Add code
Jan 25, 2025
Viaarxiv icon

Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering

Add code
Dec 16, 2024
Figure 1 for Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering
Figure 2 for Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering
Figure 3 for Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering
Figure 4 for Visual Instruction Tuning with 500x Fewer Parameters through Modality Linear Representation-Steering
Viaarxiv icon

Optimizing Dysarthria Wake-Up Word Spotting: An End-to-End Approach for SLT 2024 LRDWWS Challenge

Add code
Sep 16, 2024
Viaarxiv icon

Efficient Extraction of Noise-Robust Discrete Units from Self-Supervised Speech Models

Add code
Sep 04, 2024
Figure 1 for Efficient Extraction of Noise-Robust Discrete Units from Self-Supervised Speech Models
Figure 2 for Efficient Extraction of Noise-Robust Discrete Units from Self-Supervised Speech Models
Figure 3 for Efficient Extraction of Noise-Robust Discrete Units from Self-Supervised Speech Models
Figure 4 for Efficient Extraction of Noise-Robust Discrete Units from Self-Supervised Speech Models
Viaarxiv icon

Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding

Add code
Jun 19, 2024
Viaarxiv icon

Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling

Add code
Jun 11, 2024
Figure 1 for Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
Figure 2 for Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
Figure 3 for Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
Figure 4 for Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
Viaarxiv icon

Bridging Language Gaps in Audio-Text Retrieval

Add code
Jun 11, 2024
Figure 1 for Bridging Language Gaps in Audio-Text Retrieval
Figure 2 for Bridging Language Gaps in Audio-Text Retrieval
Figure 3 for Bridging Language Gaps in Audio-Text Retrieval
Figure 4 for Bridging Language Gaps in Audio-Text Retrieval
Viaarxiv icon