Picture for Zhiyong Wu

Zhiyong Wu

$φ$-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation

Add code
Mar 17, 2025
Viaarxiv icon

DiffCSS: Diverse and Expressive Conversational Speech Synthesis with Diffusion Models

Add code
Feb 27, 2025
Viaarxiv icon

Implicit Search via Discrete Diffusion: A Study on Chess

Add code
Feb 27, 2025
Viaarxiv icon

Singing Voice Conversion with Accompaniment Using Self-Supervised Representation-Based Melody Features

Add code
Feb 07, 2025
Viaarxiv icon

Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data

Add code
Jan 19, 2025
Figure 1 for Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data
Figure 2 for Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data
Figure 3 for Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data
Figure 4 for Leveraging Chain of Thought towards Empathetic Spoken Dialogue without Corresponding Question-Answering Data
Viaarxiv icon

learning discriminative features from spectrograms using center loss for speech emotion recognition

Add code
Jan 02, 2025
Viaarxiv icon

Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT

Add code
Jan 02, 2025
Figure 1 for Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT
Figure 2 for Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT
Figure 3 for Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT
Figure 4 for Disambiguation of Chinese Polyphones in an End-to-End Framework with Semantic Features Extracted by Pre-trained BERT
Viaarxiv icon

OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Add code
Dec 27, 2024
Viaarxiv icon

TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch

Add code
Dec 12, 2024
Viaarxiv icon

Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling

Add code
Dec 06, 2024
Figure 1 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 2 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 3 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Figure 4 for Expanding Performance Boundaries of Open-Source Multimodal Models with Model, Data, and Test-Time Scaling
Viaarxiv icon