Picture for Chenxing Li

Chenxing Li

SRC-gAudio: Sampling-Rate-Controlled Audio Generation

Add code
Oct 09, 2024
Viaarxiv icon

DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech

Add code
Sep 18, 2024
Figure 1 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Figure 2 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Figure 3 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Figure 4 for DPI-TTS: Directional Patch Interaction for Fast-Converging and Style Temporal Modeling in Text-to-Speech
Viaarxiv icon

Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0

Add code
Sep 18, 2024
Figure 1 for Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
Figure 2 for Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
Figure 3 for Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
Figure 4 for Mixture of Experts Fusion for Fake Audio Detection Using Frozen wav2vec 2.0
Viaarxiv icon

EzAudio: Enhancing Text-to-Audio Generation with Efficient Diffusion Transformer

Add code
Sep 17, 2024
Viaarxiv icon

Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation

Add code
Sep 14, 2024
Figure 1 for Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
Figure 2 for Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
Figure 3 for Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
Figure 4 for Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
Viaarxiv icon

Towards Diverse and Efficient Audio Captioning via Diffusion Models

Add code
Sep 14, 2024
Viaarxiv icon

STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment

Add code
Sep 13, 2024
Figure 1 for STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Figure 2 for STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Figure 3 for STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Figure 4 for STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Viaarxiv icon

Video-to-Audio Generation with Hidden Alignment

Add code
Jul 10, 2024
Figure 1 for Video-to-Audio Generation with Hidden Alignment
Figure 2 for Video-to-Audio Generation with Hidden Alignment
Figure 3 for Video-to-Audio Generation with Hidden Alignment
Figure 4 for Video-to-Audio Generation with Hidden Alignment
Viaarxiv icon

HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model

Add code
Jun 17, 2024
Figure 1 for HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
Figure 2 for HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
Figure 3 for HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
Figure 4 for HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
Viaarxiv icon

Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio Source Separation

Add code
Apr 17, 2024
Viaarxiv icon