Picture for Yukun Liu

Yukun Liu

Mel-Refine: A Plug-and-Play Approach to Refine Mel-Spectrogram in Audio Generation

Add code
Dec 11, 2024
Viaarxiv icon

Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0

Add code
Sep 18, 2024
Figure 1 for Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
Figure 2 for Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
Figure 3 for Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
Figure 4 for Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
Viaarxiv icon

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech

Add code
Sep 18, 2024
Figure 1 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Figure 2 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Figure 3 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Figure 4 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Viaarxiv icon

Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?

Add code
Aug 20, 2024
Figure 1 for Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?
Figure 2 for Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?
Figure 3 for Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?
Figure 4 for Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?
Viaarxiv icon

EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech

Add code
Aug 20, 2024
Figure 1 for EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech
Figure 2 for EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech
Figure 3 for EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech
Figure 4 for EELE: Exploring Efficient and Extensible LoRA Integration in Emotional Text-to-Speech
Viaarxiv icon

A Noval Feature via Color Quantisation for Fake Audio Detection

Add code
Aug 20, 2024
Viaarxiv icon

ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation

Add code
Jul 07, 2024
Figure 1 for ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
Figure 2 for ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
Figure 3 for ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
Figure 4 for ASRRL-TTS: Agile Speaker Representation Reinforcement Learning for Text-to-Speech Speaker Adaptation
Viaarxiv icon

Fake News Detection and Manipulation Reasoning via Large Vision-Language Models

Add code
Jul 02, 2024
Viaarxiv icon

MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation

Add code
Jun 15, 2024
Figure 1 for MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation
Figure 2 for MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation
Figure 3 for MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation
Figure 4 for MINT: a Multi-modal Image and Narrative Text Dubbing Dataset for Foley Audio Content Planning and Generation
Viaarxiv icon

Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio

Add code
Jun 12, 2024
Viaarxiv icon