Picture for Kaizhi Qian

Kaizhi Qian

UniMuMo: Unified Text, Music and Motion Generation

Add code
Oct 06, 2024
Figure 1 for UniMuMo: Unified Text, Music and Motion Generation
Figure 2 for UniMuMo: Unified Text, Music and Motion Generation
Figure 3 for UniMuMo: Unified Text, Music and Motion Generation
Figure 4 for UniMuMo: Unified Text, Music and Motion Generation
Viaarxiv icon

Towards Unsupervised Speech Recognition Without Pronunciation Models

Add code
Jun 12, 2024
Figure 1 for Towards Unsupervised Speech Recognition Without Pronunciation Models
Figure 2 for Towards Unsupervised Speech Recognition Without Pronunciation Models
Figure 3 for Towards Unsupervised Speech Recognition Without Pronunciation Models
Figure 4 for Towards Unsupervised Speech Recognition Without Pronunciation Models
Viaarxiv icon

RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text

Add code
May 30, 2024
Figure 1 for RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text
Figure 2 for RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text
Figure 3 for RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text
Figure 4 for RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text
Viaarxiv icon

Decomposing Uncertainty for Large Language Models through Input Clarification Ensembling

Add code
Nov 15, 2023
Viaarxiv icon

Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning

Add code
Jun 23, 2023
Figure 1 for Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning
Figure 2 for Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning
Figure 3 for Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning
Figure 4 for Master-ASR: Achieving Multilingual Scalability and Low-Resource Adaptation in ASR with Modular Learning
Viaarxiv icon

Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos

Add code
Apr 11, 2023
Viaarxiv icon

Losses Can Be Blessings: Routing Self-Supervised Speech Representations Towards Efficient Multilingual and Multitask Speech Processing

Add code
Nov 02, 2022
Viaarxiv icon

Improving Self-Supervised Speech Representations by Disentangling Speakers

Add code
Apr 20, 2022
Figure 1 for Improving Self-Supervised Speech Representations by Disentangling Speakers
Figure 2 for Improving Self-Supervised Speech Representations by Disentangling Speakers
Figure 3 for Improving Self-Supervised Speech Representations by Disentangling Speakers
Figure 4 for Improving Self-Supervised Speech Representations by Disentangling Speakers
Viaarxiv icon

WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models

Add code
Apr 14, 2022
Figure 1 for WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models
Figure 2 for WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models
Figure 3 for WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models
Figure 4 for WAVPROMPT: Towards Few-Shot Spoken Language Understanding with Frozen Language Models
Viaarxiv icon

Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition

Add code
Mar 29, 2022
Figure 1 for Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
Figure 2 for Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
Figure 3 for Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
Figure 4 for Unsupervised Text-to-Speech Synthesis by Unsupervised Automatic Speech Recognition
Viaarxiv icon