Picture for Yidi Jiang

Yidi Jiang

Streaming Keyword Spotting Boosted by Cross-layer Discrimination Consistency

Add code
Dec 17, 2024
Viaarxiv icon

WavChat: A Survey of Spoken Dialogue Models

Add code
Nov 26, 2024
Viaarxiv icon

Unified Audio Event Detection

Add code
Sep 13, 2024
Figure 1 for Unified Audio Event Detection
Figure 2 for Unified Audio Event Detection
Figure 3 for Unified Audio Event Detection
Figure 4 for Unified Audio Event Detection
Viaarxiv icon

Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching

Add code
Sep 07, 2024
Figure 1 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Figure 2 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Figure 3 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Figure 4 for Flow-TSVAD: Target-Speaker Voice Activity Detection via Latent Flow Matching
Viaarxiv icon

WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling

Add code
Aug 29, 2024
Figure 1 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 2 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 3 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Figure 4 for WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Viaarxiv icon

Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization

Add code
Jul 25, 2024
Figure 1 for Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization
Figure 2 for Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization
Figure 3 for Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization
Figure 4 for Multi-Stage Face-Voice Association Learning with Keynote Speaker Diarization
Viaarxiv icon

Target Speech Diarization with Multimodal Prompts

Add code
Jun 11, 2024
Viaarxiv icon

Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention

Add code
Apr 29, 2024
Figure 1 for Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Figure 2 for Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Figure 3 for Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Figure 4 for Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention
Viaarxiv icon

Voice Conversion Augmentation for Speaker Recognition on Defective Datasets

Add code
Apr 01, 2024
Viaarxiv icon

The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge

Add code
Dec 26, 2023
Viaarxiv icon