Picture for Rongjie Huang

Rongjie Huang

MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence

Add code
Nov 04, 2024
Viaarxiv icon

OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup

Add code
Oct 28, 2024
Figure 1 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Figure 2 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Figure 3 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Figure 4 for OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
Viaarxiv icon

FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation

Add code
Oct 16, 2024
Viaarxiv icon

MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes

Add code
Oct 09, 2024
Figure 1 for MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
Figure 2 for MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
Figure 3 for MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
Figure 4 for MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
Viaarxiv icon

TCSinger: Zero-Shot Singing Voice Synthesis with Style Transfer and Multi-Level Style Control

Add code
Sep 26, 2024
Viaarxiv icon

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Add code
Aug 29, 2024
Figure 1 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 2 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 3 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 4 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Viaarxiv icon

Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization

Add code
Aug 22, 2024
Viaarxiv icon

MEDIC: Zero-shot Music Editing with Disentangled Inversion Control

Add code
Jul 18, 2024
Viaarxiv icon

OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces

Add code
Jul 16, 2024
Viaarxiv icon

Accompanied Singing Voice Synthesis with Fully Text-controlled Melody

Add code
Jul 02, 2024
Viaarxiv icon