Picture for Michael Zeng

Michael Zeng

Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation

Add code
Sep 06, 2024
Viaarxiv icon

TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation

Add code
May 28, 2024
Viaarxiv icon

CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations

Add code
Apr 10, 2024
Viaarxiv icon

Making Flow-Matching-Based Zero-Shot Text-to-Speech Laugh as You Like

Add code
Feb 12, 2024
Viaarxiv icon

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Add code
Nov 10, 2023
Viaarxiv icon

Diffusion Conditional Expectation Model for Efficient and Robust Target Speech Extraction

Add code
Sep 25, 2023
Viaarxiv icon

Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition

Add code
Aug 03, 2023
Figure 1 for Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition
Figure 2 for Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition
Figure 3 for Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition
Figure 4 for Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition
Viaarxiv icon

Adapting Multi-Lingual ASR Models for Handling Multiple Talkers

Add code
May 30, 2023
Viaarxiv icon

ComSL: A Composite Speech-Language Model for End-to-End Speech-to-Text Translation

Add code
May 24, 2023
Viaarxiv icon

i-Code Studio: A Configurable and Composable Framework for Integrative AI

Add code
May 23, 2023
Viaarxiv icon