Picture for Zhifang Guo

Zhifang Guo

Qwen3-ASR Technical Report

Add code
Jan 29, 2026
Viaarxiv icon

Qwen3-TTS Technical Report

Add code
Jan 22, 2026
Viaarxiv icon

Speech Token Prediction via Compressed-to-fine Language Modeling for Speech Generation

Add code
May 30, 2025
Viaarxiv icon

Qwen2.5-Omni Technical Report

Add code
Mar 26, 2025
Figure 1 for Qwen2.5-Omni Technical Report
Figure 2 for Qwen2.5-Omni Technical Report
Figure 3 for Qwen2.5-Omni Technical Report
Figure 4 for Qwen2.5-Omni Technical Report
Viaarxiv icon

InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training

Add code
Mar 04, 2025
Figure 1 for InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training
Figure 2 for InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training
Figure 3 for InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training
Figure 4 for InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training
Viaarxiv icon

Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models

Add code
Sep 28, 2024
Figure 1 for Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
Figure 2 for Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
Figure 3 for Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
Figure 4 for Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models
Viaarxiv icon

Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training

Add code
Aug 15, 2024
Figure 1 for Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training
Figure 2 for Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training
Figure 3 for Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training
Figure 4 for Advancing Multi-grained Alignment for Contrastive Language-Audio Pre-training
Viaarxiv icon

Qwen2-Audio Technical Report

Add code
Jul 15, 2024
Viaarxiv icon

PromptTTS 2: Describing and Generating Voices with Text Prompt

Add code
Sep 05, 2023
Figure 1 for PromptTTS 2: Describing and Generating Voices with Text Prompt
Figure 2 for PromptTTS 2: Describing and Generating Voices with Text Prompt
Figure 3 for PromptTTS 2: Describing and Generating Voices with Text Prompt
Figure 4 for PromptTTS 2: Describing and Generating Voices with Text Prompt
Viaarxiv icon

Audio Generation with Multiple Conditional Diffusion Model

Add code
Aug 23, 2023
Viaarxiv icon