Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:DiarizationLM: Speaker Diarization Post-Processing with Large Language Models

Jan 16, 2024

Quan Wang, Yiling Huang, Guanlong Zhao, Evan Clark, Wei Xia, Hank Liao

Figure 1 for DiarizationLM: Speaker Diarization Post-Processing with Large Language Models

Figure 2 for DiarizationLM: Speaker Diarization Post-Processing with Large Language Models

Figure 3 for DiarizationLM: Speaker Diarization Post-Processing with Large Language Models

Figure 4 for DiarizationLM: Speaker Diarization Post-Processing with Large Language Models

Share this with someone who'll enjoy it:

Abstract:In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system. Various goals can be achieved with the proposed framework, such as improving the readability of the diarized transcript, or reducing the word diarization error rate (WDER). In this framework, the outputs of the automatic speech recognition (ASR) and speaker diarization systems are represented as a compact textual format, which is included in the prompt to an optionally finetuned LLM. The outputs of the LLM can be used as the refined diarization results with the desired enhancement. As a post-processing step, this framework can be easily applied to any off-the-shelf ASR and speaker diarization systems without retraining existing components. Our experiments show that a finetuned PaLM 2-S model can reduce the WDER by rel. 55.5% on the Fisher telephone conversation dataset, and rel. 44.9% on the Callhome English dataset.

View paper on

Share this with someone who'll enjoy it:

Title:DiarizationLM: Speaker Diarization Post-Processing with Large Language Models

Paper and Code