Picture for Zhizheng Wu

Zhizheng Wu

Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement

Add code
Feb 11, 2025
Viaarxiv icon

Metis: A Foundation Speech Generation Model with Masked Generative Pre-training

Add code
Feb 05, 2025
Viaarxiv icon

Emilia: A Large-Scale, Extensive, Multilingual, and Diverse Dataset for Speech Generation

Add code
Jan 27, 2025
Viaarxiv icon

Overview of the Amphion Toolkit (v0.2)

Add code
Jan 26, 2025
Viaarxiv icon

AnyEnhance: A Unified Generative Model with Prompt-Guidance and Self-Critic for Voice Enhancement

Add code
Jan 26, 2025
Viaarxiv icon

Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities

Add code
Nov 29, 2024
Figure 1 for Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
Figure 2 for Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
Figure 3 for Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
Figure 4 for Noro: A Noise-Robust One-shot Voice Conversion System with Hidden Speaker Representation Capabilities
Viaarxiv icon

Debatts: Zero-Shot Debating Text-to-Speech Synthesis

Add code
Nov 10, 2024
Figure 1 for Debatts: Zero-Shot Debating Text-to-Speech Synthesis
Figure 2 for Debatts: Zero-Shot Debating Text-to-Speech Synthesis
Figure 3 for Debatts: Zero-Shot Debating Text-to-Speech Synthesis
Figure 4 for Debatts: Zero-Shot Debating Text-to-Speech Synthesis
Viaarxiv icon

LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models

Add code
Oct 13, 2024
Figure 1 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Figure 2 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Figure 3 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Figure 4 for LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models
Viaarxiv icon

SpMis: An Investigation of Synthetic Spoken Misinformation Detection

Add code
Sep 17, 2024
Figure 1 for SpMis: An Investigation of Synthetic Spoken Misinformation Detection
Figure 2 for SpMis: An Investigation of Synthetic Spoken Misinformation Detection
Figure 3 for SpMis: An Investigation of Synthetic Spoken Misinformation Detection
Figure 4 for SpMis: An Investigation of Synthetic Spoken Misinformation Detection
Viaarxiv icon

Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation

Add code
Sep 06, 2024
Figure 1 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 2 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 3 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 4 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Viaarxiv icon