Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Separator-Transducer-Segmenter: Streaming Recognition and Segmentation of Multi-party Speech

May 10, 2022

Ilya Sklyar, Anna Piunova, Christian Osendorfer

Figure 1 for Separator-Transducer-Segmenter: Streaming Recognition and Segmentation of Multi-party Speech

Figure 2 for Separator-Transducer-Segmenter: Streaming Recognition and Segmentation of Multi-party Speech

Figure 3 for Separator-Transducer-Segmenter: Streaming Recognition and Segmentation of Multi-party Speech

Figure 4 for Separator-Transducer-Segmenter: Streaming Recognition and Segmentation of Multi-party Speech

Share this with someone who'll enjoy it:

Abstract:Streaming recognition and segmentation of multi-party conversations with overlapping speech is crucial for the next generation of voice assistant applications. In this work we address its challenges discovered in the previous work on multi-turn recurrent neural network transducer (MT-RNN-T) with a novel approach, separator-transducer-segmenter (STS), that enables tighter integration of speech separation, recognition and segmentation in a single model. First, we propose a new segmentation modeling strategy through start-of-turn and end-of-turn tokens that improves segmentation without recognition accuracy degradation. Second, we further improve both speech recognition and segmentation accuracy through an emission regularization method, FastEmit, and multi-task training with speech activity information as an additional training signal. Third, we experiment with end-of-turn emission latency penalty to improve end-point detection for each speaker turn. Finally, we establish a novel framework for segmentation analysis of multi-party conversations through emission latency metrics. With our best model, we report 4.6% abs. turn counting accuracy improvement and 17% rel. word error rate (WER) improvement on LibriCSS dataset compared to the previously published work.

* Submitted to InterSpeech 2022

View paper on

Share this with someone who'll enjoy it:

Title:Separator-Transducer-Segmenter: Streaming Recognition and Segmentation of Multi-party Speech

Paper and Code