Picture for Jinyu Li

Jinyu Li

Fred

AlignFormer: Modality Matching Can Achieve Better Zero-shot Instruction-Following Speech-LLM

Add code
Dec 02, 2024
Viaarxiv icon

V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow

Add code
Nov 29, 2024
Viaarxiv icon

TS3-Codec: Transformer-Based Simple Streaming Single Codec

Add code
Nov 27, 2024
Viaarxiv icon

ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation

Add code
Oct 27, 2024
Figure 1 for ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Figure 2 for ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Figure 3 for ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Figure 4 for ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Viaarxiv icon

Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation

Add code
Oct 17, 2024
Figure 1 for Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Figure 2 for Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Figure 3 for Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Figure 4 for Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Viaarxiv icon

CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation

Add code
Oct 07, 2024
Figure 1 for CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation
Figure 2 for CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation
Figure 3 for CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation
Figure 4 for CTC-GMM: CTC guided modality matching for fast and accurate streaming speech translation
Viaarxiv icon

Target word activity detector: An approach to obtain ASR word boundaries without lexicon

Add code
Sep 20, 2024
Viaarxiv icon

Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation

Add code
Sep 06, 2024
Figure 1 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 2 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 3 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Figure 4 for Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Viaarxiv icon

Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech

Add code
Jul 17, 2024
Figure 1 for Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Figure 2 for Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Figure 3 for Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Figure 4 for Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech
Viaarxiv icon

Autoregressive Speech Synthesis without Vector Quantization

Add code
Jul 11, 2024
Viaarxiv icon