Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer

Mar 20, 2024

Yu Xi, Hao Li, Baochen Yang, Haoyu Li, Hainan Xu, Kai Yu

Figure 1 for TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer

Figure 2 for TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer

Figure 3 for TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer

Figure 4 for TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer

Share this with someone who'll enjoy it:

Abstract:Designing an efficient keyword spotting (KWS) system that delivers exceptional performance on resource-constrained edge devices has long been a subject of significant attention. Existing KWS search algorithms typically follow a frame-synchronous approach, where search decisions are made repeatedly at each frame despite the fact that most frames are keyword-irrelevant. In this paper, we propose TDT-KWS, which leverages token-and-duration Transducers (TDT) for KWS tasks. We also propose a novel KWS task-specific decoding algorithm for Transducer-based models, which supports highly effective frame-asynchronous keyword search in streaming speech scenarios. With evaluations conducted on both the public Hey Snips and self-constructed LibriKWS-20 datasets, our proposed KWS-decoding algorithm produces more accurate results than conventional ASR decoding algorithms. Additionally, TDT-KWS achieves on-par or better wake word detection performance than both RNN-T and traditional TDT-ASR systems while achieving significant inference speed-up. Furthermore, experiments show that TDT-KWS is more robust to noisy environments compared to RNN-T KWS.

* Accepted by ICASSP2024

View paper on

Share this with someone who'll enjoy it:

Title:TDT-KWS: Fast And Accurate Keyword Spotting Using Token-and-duration Transducer

Paper and Code