Picture for Luyao Cheng

Luyao Cheng

OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

Add code
Oct 23, 2024
Figure 1 for OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Figure 2 for OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Figure 3 for OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Figure 4 for OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Viaarxiv icon

Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization

Add code
Aug 22, 2024
Viaarxiv icon

Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

Add code
Jun 17, 2024
Viaarxiv icon

ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency

Add code
Jun 04, 2024
Viaarxiv icon

3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization

Add code
Mar 29, 2024
Viaarxiv icon

Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation

Add code
Sep 19, 2023
Figure 1 for Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Figure 2 for Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Figure 3 for Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Figure 4 for Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Viaarxiv icon

3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement

Add code
Jun 28, 2023
Figure 1 for 3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement
Figure 2 for 3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement
Figure 3 for 3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement
Figure 4 for 3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement
Viaarxiv icon

Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization

Add code
May 22, 2023
Viaarxiv icon

An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification

Add code
May 22, 2023
Viaarxiv icon

CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking

Add code
Mar 02, 2023
Viaarxiv icon