Picture for Jiahao Pan

Jiahao Pan

YuE: Scaling Open Foundation Models for Long-Form Music Generation

Add code
Mar 11, 2025
Viaarxiv icon

Spark-TTS: An Efficient LLM-Based Text-to-Speech Model with Single-Stream Decoupled Speech Tokens

Add code
Mar 03, 2025
Viaarxiv icon

Audio-FLAN: A Preliminary Release

Add code
Feb 23, 2025
Viaarxiv icon

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Add code
Aug 30, 2024
Figure 1 for Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Figure 2 for Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Figure 3 for Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Figure 4 for Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Viaarxiv icon

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

Add code
Jul 30, 2024
Figure 1 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Figure 2 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Figure 3 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Figure 4 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Viaarxiv icon

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Add code
Jun 06, 2024
Viaarxiv icon

CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild

Add code
May 27, 2024
Figure 1 for CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
Figure 2 for CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
Figure 3 for CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
Figure 4 for CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild
Viaarxiv icon

ComposerX: Multi-Agent Symbolic Music Composition with LLMs

Add code
Apr 30, 2024
Figure 1 for ComposerX: Multi-Agent Symbolic Music Composition with LLMs
Figure 2 for ComposerX: Multi-Agent Symbolic Music Composition with LLMs
Figure 3 for ComposerX: Multi-Agent Symbolic Music Composition with LLMs
Figure 4 for ComposerX: Multi-Agent Symbolic Music Composition with LLMs
Viaarxiv icon

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Add code
Apr 25, 2024
Figure 1 for FlashSpeech: Efficient Zero-Shot Speech Synthesis
Figure 2 for FlashSpeech: Efficient Zero-Shot Speech Synthesis
Figure 3 for FlashSpeech: Efficient Zero-Shot Speech Synthesis
Figure 4 for FlashSpeech: Efficient Zero-Shot Speech Synthesis
Viaarxiv icon

Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation

Add code
Nov 29, 2023
Figure 1 for Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Figure 2 for Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Figure 3 for Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Figure 4 for Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Viaarxiv icon