Picture for Zhizheng Wu

Zhizheng Wu

Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities

Add code
Nov 29, 2024
Figure 1 for Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
Figure 2 for Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
Figure 3 for Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
Figure 4 for Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
Viaarxiv icon

Debatts: Zero-Shot Debating Text-to-Speech Synthesis

Add code
Nov 10, 2024
Viaarxiv icon

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Add code
Oct 13, 2024
Figure 1 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Figure 2 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Figure 3 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Figure 4 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Viaarxiv icon

SpMis: An Investigation of Synthetic Spoken Misinformation Detection

Add code
Sep 17, 2024
Figure 1 for SpMis: An Investigation of Synthetic Spoken Misinformation Detection
Figure 2 for SpMis: An Investigation of Synthetic Spoken Misinformation Detection
Figure 3 for SpMis: An Investigation of Synthetic Spoken Misinformation Detection
Figure 4 for SpMis: An Investigation of Synthetic Spoken Misinformation Detection
Viaarxiv icon

Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation

Add code
Sep 06, 2024
Figure 1 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 2 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 3 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 4 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Viaarxiv icon

MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer

Add code
Sep 01, 2024
Viaarxiv icon

Foundation Models for Music: A Survey

Add code
Aug 27, 2024
Figure 1 for Foundation Models for Music: A Survey
Figure 2 for Foundation Models for Music: A Survey
Figure 3 for Foundation Models for Music: A Survey
Figure 4 for Foundation Models for Music: A Survey
Viaarxiv icon

Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation

Add code
Jul 07, 2024
Figure 1 for Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Figure 2 for Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Figure 3 for Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Figure 4 for Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Viaarxiv icon

PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation

Add code
Jul 03, 2024
Figure 1 for PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
Figure 2 for PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
Figure 3 for PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation
Viaarxiv icon

AudioTime: A Temporally-aligned Audio-text Benchmark Dataset

Add code
Jul 03, 2024
Figure 1 for AudioTime: A Temporally-aligned Audio-text Benchmark Dataset
Figure 2 for AudioTime: A Temporally-aligned Audio-text Benchmark Dataset
Figure 3 for AudioTime: A Temporally-aligned Audio-text Benchmark Dataset
Figure 4 for AudioTime: A Temporally-aligned Audio-text Benchmark Dataset
Viaarxiv icon