Picture for Jinyu Li

Jinyu Li

Fred

Sce2DriveX: A Generalized MLLM Framework for Scene-to-Drive Learning

Add code
Feb 19, 2025
Viaarxiv icon

Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation

Add code
Feb 04, 2025
Figure 1 for Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
Figure 2 for Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
Figure 3 for Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
Figure 4 for Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation
Viaarxiv icon

Addressing speaker gender bias in large scale speech translation systems

Add code
Jan 10, 2025
Viaarxiv icon

Interleaved Speech-Text Language Models are Simple Streaming Text to Speech Synthesizers

Add code
Dec 23, 2024
Viaarxiv icon

SLAM-Omni: Timbre-Controllable Voice Interaction System with Single-Stage Training

Add code
Dec 20, 2024
Viaarxiv icon

AlignFormer: Modality Matching Can Achieve Better Zero-shot Instruction-Following Speech-LLM

Add code
Dec 02, 2024
Viaarxiv icon

V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow

Add code
Nov 29, 2024
Viaarxiv icon

TS3-Codec: Transformer-Based Simple Streaming Single Codec

Add code
Nov 27, 2024
Viaarxiv icon

ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation

Add code
Oct 27, 2024
Figure 1 for ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Figure 2 for ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Figure 3 for ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Figure 4 for ARLON: Boosting Diffusion Transformers with Autoregressive Models for Long Video Generation
Viaarxiv icon

Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation

Add code
Oct 17, 2024
Figure 1 for Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Figure 2 for Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Figure 3 for Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Figure 4 for Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation
Viaarxiv icon