Picture for Luyao Cheng

Luyao Cheng

OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

Add code
Oct 23, 2024
Figure 1 for OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Figure 2 for OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Figure 3 for OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Figure 4 for OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Viaarxiv icon

Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization

Add code
Aug 22, 2024
Figure 1 for Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization
Figure 2 for Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization
Figure 3 for Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization
Figure 4 for Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization
Viaarxiv icon

Self-Distillation Prototypes Network: Learning Robust Speaker Representations without Supervision

Add code
Jun 17, 2024
Viaarxiv icon

ERes2NetV2: Boosting Short-Duration Speaker Verification Performance with Computational Efficiency

Add code
Jun 04, 2024
Viaarxiv icon

3D-Speaker-Toolkit: An Open Source Toolkit for Multi-modal Speaker Verification and Diarization

Add code
Mar 29, 2024
Viaarxiv icon

Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation

Add code
Sep 19, 2023
Figure 1 for Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Figure 2 for Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Figure 3 for Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Figure 4 for Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation
Viaarxiv icon

3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement

Add code
Jun 28, 2023
Figure 1 for 3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement
Figure 2 for 3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement
Figure 3 for 3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement
Figure 4 for 3D-Speaker: A Large-Scale Multi-Device, Multi-Distance, and Multi-Dialect Corpus for Speech Representation Disentanglement
Viaarxiv icon

An Enhanced Res2Net with Local and Global Feature Fusion for Speaker Verification

Add code
May 22, 2023
Viaarxiv icon

Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization

Add code
May 22, 2023
Viaarxiv icon

CAM++: A Fast and Efficient Network for Speaker Verification Using Context-Aware Masking

Add code
Mar 02, 2023
Viaarxiv icon