Picture for Yidi Jiang

Yidi Jiang

Unified Audio Event Detection

Add code
Sep 13, 2024
Viaarxiv icon

Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching

Add code
Sep 07, 2024
Figure 1 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Figure 2 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Figure 3 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Figure 4 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Viaarxiv icon

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Add code
Aug 29, 2024
Figure 1 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 2 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 3 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 4 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Viaarxiv icon

Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization

Add code
Jul 25, 2024
Viaarxiv icon

Target Speech Diarization with Multimodal Prompts

Add code
Jun 11, 2024
Viaarxiv icon

Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention

Add code
Apr 29, 2024
Viaarxiv icon

Voice Conversion Augmentation for Speaker Recognition on Defective Datasets

Add code
Apr 01, 2024
Viaarxiv icon

The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge

Add code
Dec 26, 2023
Viaarxiv icon

Prompt-driven Target Speech Diarization

Add code
Oct 23, 2023
Viaarxiv icon

EEG-Derived Voice Signature for Attended Speaker Detection

Add code
Aug 28, 2023
Viaarxiv icon