Speaker Diarization


Speaker diarization is the process of segmenting and clustering speech signals to identify different speakers in an audio recording.

SeniorTalk: A Chinese Conversation Dataset with Rich Annotations for Super-Aged Seniors

Add code
Mar 20, 2025
Viaarxiv icon

Playing with Voices: Tabletop Role-Playing Game Recordings as a Diarization Challenge

Add code
Feb 18, 2025
Viaarxiv icon

Afrispeech-Dialog: A Benchmark Dataset for Spontaneous English Conversations in Healthcare and Beyond

Add code
Feb 06, 2025
Viaarxiv icon

Language Modelling for Speaker Diarization in Telephonic Interviews

Add code
Jan 28, 2025
Figure 1 for Language Modelling for Speaker Diarization in Telephonic Interviews
Figure 2 for Language Modelling for Speaker Diarization in Telephonic Interviews
Figure 3 for Language Modelling for Speaker Diarization in Telephonic Interviews
Figure 4 for Language Modelling for Speaker Diarization in Telephonic Interviews
Viaarxiv icon

SCDiar: a streaming diarization system based on speaker change detection and speech recognition

Add code
Jan 28, 2025
Viaarxiv icon

SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models

Add code
Jan 14, 2025
Figure 1 for SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models
Figure 2 for SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models
Figure 3 for SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models
Figure 4 for SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models
Viaarxiv icon

Unsupervised Speech Segmentation: A General Approach Using Speech Language Models

Add code
Jan 07, 2025
Viaarxiv icon

Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection

Add code
Jan 07, 2025
Figure 1 for Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection
Figure 2 for Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection
Figure 3 for Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection
Figure 4 for Universal Speaker Embedding Free Target Speaker Extraction and Personal Voice Activity Detection
Viaarxiv icon

DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition

Add code
Dec 30, 2024
Viaarxiv icon

TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch

Add code
Dec 12, 2024
Viaarxiv icon