Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

May 23, 2023

Yuhao Liang, Fan Yu, Yangze Li, Pengcheng Guo, Shiliang Zhang, Qian Chen, Lei Xie

Figure 1 for BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

Figure 2 for BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

Figure 3 for BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

Figure 4 for BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

Share this with someone who'll enjoy it:

Abstract:The recently proposed serialized output training (SOT) simplifies multi-talker automatic speech recognition (ASR) by generating speaker transcriptions separated by a special token. However, frequent speaker changes can make speaker change prediction difficult. To address this, we propose boundary-aware serialized output training (BA-SOT), which explicitly incorporates boundary knowledge into the decoder via a speaker change detection task and boundary constraint loss. We also introduce a two-stage connectionist temporal classification (CTC) strategy that incorporates token-level SOT CTC to restore temporal context information. Besides typical character error rate (CER), we introduce utterance-dependent character error rate (UD-CER) to further measure the precision of speaker change prediction. Compared to original SOT, BA-SOT reduces CER/UD-CER by 5.1%/14.0%, and leveraging a pre-trained ASR model for BA-SOT model initialization further reduces CER/UD-CER by 8.4%/19.9%.

* Accepted by INTERSPEECH 2023

View paper on

Share this with someone who'll enjoy it:

Title:BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

Paper and Code