Picture for Mingbo Ma

Mingbo Ma

VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing

Add code
Apr 11, 2024
Viaarxiv icon

Efficient Neural Music Generation

Add code
May 25, 2023
Figure 1 for Efficient Neural Music Generation
Figure 2 for Efficient Neural Music Generation
Figure 3 for Efficient Neural Music Generation
Figure 4 for Efficient Neural Music Generation
Viaarxiv icon

Non-parallel Accent Conversion using Pseudo Siamese Disentanglement Network

Add code
Dec 12, 2022
Viaarxiv icon

Data-Driven Adaptive Simultaneous Machine Translation

Add code
Apr 27, 2022
Figure 1 for Data-Driven Adaptive Simultaneous Machine Translation
Figure 2 for Data-Driven Adaptive Simultaneous Machine Translation
Figure 3 for Data-Driven Adaptive Simultaneous Machine Translation
Figure 4 for Data-Driven Adaptive Simultaneous Machine Translation
Viaarxiv icon

A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing

Add code
Mar 18, 2022
Figure 1 for A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing
Figure 2 for A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing
Figure 3 for A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing
Figure 4 for A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing
Viaarxiv icon

Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR

Add code
Jun 11, 2021
Figure 1 for Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR
Figure 2 for Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR
Figure 3 for Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR
Figure 4 for Direct Simultaneous Speech-to-Text Translation Assisted by Synchronized Streaming ASR
Viaarxiv icon

Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation

Add code
Feb 10, 2021
Figure 1 for Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation
Figure 2 for Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation
Figure 3 for Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation
Figure 4 for Fused Acoustic and Text Encoding for Multimodal Bilingual Pretraining and Speech Translation
Viaarxiv icon

MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation

Add code
Oct 22, 2020
Figure 1 for MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation
Figure 2 for MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation
Figure 3 for MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation
Figure 4 for MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation
Viaarxiv icon

Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training

Add code
Oct 21, 2020
Figure 1 for Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training
Figure 2 for Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training
Figure 3 for Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training
Figure 4 for Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training
Viaarxiv icon

Improving Simultaneous Translation with Pseudo References

Add code
Oct 21, 2020
Figure 1 for Improving Simultaneous Translation with Pseudo References
Figure 2 for Improving Simultaneous Translation with Pseudo References
Figure 3 for Improving Simultaneous Translation with Pseudo References
Figure 4 for Improving Simultaneous Translation with Pseudo References
Viaarxiv icon