Picture for Lei Xie

Lei Xie

Ph.D. Program in Computer Science, The Graduate Center, The City University of New York, New York, New York, USA, Ph.D. Program in Biology and Biochemistry, The Graduate Center, The City University of New York, New York, New York, USA, Department of Computer Science, Hunter College, The City University of New York, New York, New York, USA, Helen and Robert Appel Alzheimers Disease Research Institute, Feil Family Brain and Mind Research Institute, Weill Cornell Medicine, Cornell University, New York, New York, USA

Three-Dimensional Sparse Random Mode Decomposition for Mode Disentangling with Crossover Instantaneous Frequencies

Add code
Jan 25, 2025
Viaarxiv icon

Adaptive Data Augmentation with NaturalSpeech3 for Far-field Speaker Verification

Add code
Jan 15, 2025
Viaarxiv icon

DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification

Add code
Jan 09, 2025
Figure 1 for DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification
Figure 2 for DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification
Figure 3 for DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification
Figure 4 for DiffAttack: Diffusion-based Timbre-reserved Adversarial Attack in Speaker Identification
Viaarxiv icon

FleSpeech: Flexibly Controllable Speech Generation with Various Prompts

Add code
Jan 08, 2025
Viaarxiv icon

ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training

Add code
Jan 08, 2025
Viaarxiv icon

Autoregressive Speech Synthesis with Next-Distribution Prediction

Add code
Dec 22, 2024
Viaarxiv icon

CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition

Add code
Dec 17, 2024
Figure 1 for CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition
Figure 2 for CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition
Figure 3 for CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition
Figure 4 for CAMEL: Cross-Attention Enhanced Mixture-of-Experts and Language Bias for Code-Switching Speech Recognition
Viaarxiv icon

YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls

Add code
Dec 12, 2024
Figure 1 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 2 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 3 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Figure 4 for YingSound: Video-Guided Sound Effects Generation with Multi-modal Chain-of-Thought Controls
Viaarxiv icon

StableVC: Style Controllable Zero-Shot Voice Conversion with Conditional Flow Matching

Add code
Dec 10, 2024
Viaarxiv icon

SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR

Add code
Dec 07, 2024
Viaarxiv icon