Picture for Chunyu Qiang

Chunyu Qiang

Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation

Add code
Dec 11, 2024
Viaarxiv icon

EmoPro: A Prompt Selection Strategy for Emotional Expression in LM-based Speech Synthesis

Add code
Sep 27, 2024
Viaarxiv icon

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech

Add code
Sep 18, 2024
Figure 1 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Figure 2 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Figure 3 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Figure 4 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Viaarxiv icon

Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation

Add code
Sep 14, 2024
Figure 1 for Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
Figure 2 for Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
Figure 3 for Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
Figure 4 for Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
Viaarxiv icon

VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing

Add code
Aug 11, 2024
Viaarxiv icon

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation

Add code
Jul 07, 2024
Figure 1 for ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
Figure 2 for ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
Figure 3 for ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
Figure 4 for ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
Viaarxiv icon

MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation

Add code
Jun 15, 2024
Figure 1 for MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation
Figure 2 for MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation
Figure 3 for MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation
Figure 4 for MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation
Viaarxiv icon

An Initial Investigation of Language Adaptation for TTS Systems under Low-resource Scenarios

Add code
Jun 13, 2024
Viaarxiv icon

PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation

Add code
Jun 07, 2024
Figure 1 for PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation
Figure 2 for PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation
Figure 3 for PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation
Figure 4 for PPPR: Portable Plug-in Prompt Refiner for Text to Audio Generation
Viaarxiv icon

High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models

Add code
Sep 27, 2023
Viaarxiv icon