Picture for Zexu Pan

Zexu Pan

pTSE-T: Presentation Target Speaker Extraction using Unaligned Text Cues

Add code
Nov 05, 2024
Viaarxiv icon

Speech Separation with Pretrained Frontend to Minimize Domain Mismatch

Add code
Nov 05, 2024
Viaarxiv icon

Emotional Dimension Control in Language Model-Based Text-to-Speech: Spanning a Broad Spectrum of Human Emotions

Add code
Sep 25, 2024
Viaarxiv icon

TF-Locoformer: Transformer with Local Modeling by Convolution for Speech Separation and Enhancement

Add code
Aug 06, 2024
Viaarxiv icon

Enhanced Reverberation as Supervision for Unsupervised Speech Separation

Add code
Aug 06, 2024
Viaarxiv icon

NIIRF: Neural IIR Filter Field for HRTF Upsampling and Personalization

Add code
Feb 27, 2024
Viaarxiv icon

NeuroHeed+: Improving Neuro-steered Speaker Extraction with Joint Auditory Attention Detection

Add code
Dec 12, 2023
Viaarxiv icon

Scenario-Aware Audio-Visual TF-GridNet for Target Speech Extraction

Add code
Oct 30, 2023
Viaarxiv icon

LocSelect: Target Speaker Localization with an Auditory Selective Hearing Mechanism

Add code
Oct 17, 2023
Viaarxiv icon

Generation or Replication: Auscultating Audio Latent Diffusion Models

Add code
Oct 16, 2023
Figure 1 for Generation or Replication: Auscultating Audio Latent Diffusion Models
Figure 2 for Generation or Replication: Auscultating Audio Latent Diffusion Models
Figure 3 for Generation or Replication: Auscultating Audio Latent Diffusion Models
Figure 4 for Generation or Replication: Auscultating Audio Latent Diffusion Models
Viaarxiv icon