Picture for Jiahao Pan

Jiahao Pan

Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model

Add code
Aug 30, 2024
Figure 1 for Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Figure 2 for Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Figure 3 for Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Figure 4 for Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Viaarxiv icon

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

Add code
Jul 30, 2024
Figure 1 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Figure 2 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Figure 3 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Figure 4 for MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Viaarxiv icon

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Add code
Jun 06, 2024
Viaarxiv icon

CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild

Add code
May 27, 2024
Viaarxiv icon

ComposerX: Multi-Agent Symbolic Music Composition with LLMs

Add code
Apr 30, 2024
Figure 1 for ComposerX: Multi-Agent Symbolic Music Composition with LLMs
Figure 2 for ComposerX: Multi-Agent Symbolic Music Composition with LLMs
Figure 3 for ComposerX: Multi-Agent Symbolic Music Composition with LLMs
Figure 4 for ComposerX: Multi-Agent Symbolic Music Composition with LLMs
Viaarxiv icon

FlashSpeech: Efficient Zero-Shot Speech Synthesis

Add code
Apr 25, 2024
Viaarxiv icon

Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation

Add code
Nov 29, 2023
Figure 1 for Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Figure 2 for Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Figure 3 for Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Figure 4 for Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Viaarxiv icon

LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

Add code
Jul 07, 2023
Figure 1 for LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
Figure 2 for LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
Figure 3 for LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
Figure 4 for LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
Viaarxiv icon