Picture for Mu Yang

Mu Yang

Emotion-Aware Prefix: Towards Explicit Emotion Control in Voice Conversion Models

Add code
Mar 10, 2026
Viaarxiv icon

Bridging the Modality Gap: Softly Discretizing Audio Representation for LLM-based Automatic Speech Recognition

Add code
Jun 06, 2025
Viaarxiv icon

Autoregressive Meta-Actions for Unified Controllable Trajectory Generation

Add code
May 29, 2025
Figure 1 for Autoregressive Meta-Actions for Unified Controllable Trajectory Generation
Figure 2 for Autoregressive Meta-Actions for Unified Controllable Trajectory Generation
Figure 3 for Autoregressive Meta-Actions for Unified Controllable Trajectory Generation
Figure 4 for Autoregressive Meta-Actions for Unified Controllable Trajectory Generation
Viaarxiv icon

DRoPE: Directional Rotary Position Embedding for Efficient Agent Interaction Modeling

Add code
Mar 19, 2025
Viaarxiv icon

UniScene: Unified Occupancy-centric Driving Scene Generation

Add code
Dec 06, 2024
Viaarxiv icon

Audiobox TTA-RAG: Improving Zero-Shot and Few-Shot Text-To-Audio with Retrieval-Augmented Generation

Add code
Nov 07, 2024
Viaarxiv icon

DiariST: Streaming Speech Translation with Speaker Diarization

Add code
Sep 14, 2023
Figure 1 for DiariST: Streaming Speech Translation with Speaker Diarization
Figure 2 for DiariST: Streaming Speech Translation with Speaker Diarization
Viaarxiv icon

What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model

Add code
Jun 10, 2023
Figure 1 for What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model
Figure 2 for What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model
Figure 3 for What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model
Figure 4 for What Can an Accent Identifier Learn? Probing Phonetic and Prosodic Information in a Wav2vec2-based Accent Identification Model
Viaarxiv icon

Learning ASR pathways: A sparse multilingual ASR model

Add code
Sep 13, 2022
Figure 1 for Learning ASR pathways: A sparse multilingual ASR model
Figure 2 for Learning ASR pathways: A sparse multilingual ASR model
Figure 3 for Learning ASR pathways: A sparse multilingual ASR model
Figure 4 for Learning ASR pathways: A sparse multilingual ASR model
Viaarxiv icon

Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment

Add code
Apr 07, 2022
Figure 1 for Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment
Figure 2 for Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment
Figure 3 for Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment
Figure 4 for Improving Mispronunciation Detection with Wav2vec2-based Momentum Pseudo-Labeling for Accentedness and Intelligibility Assessment
Viaarxiv icon