Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition

Nov 03, 2020

Ching-Feng Yeh, Yongqiang Wang, Yangyang Shi, Chunyang Wu, Frank Zhang, Julian Chan, Michael L. Seltzer

Figure 1 for Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition

Figure 2 for Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition

Figure 3 for Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition

Figure 4 for Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition

Share this with someone who'll enjoy it:

Abstract:Attention-based models have been gaining popularity recently for their strong performance demonstrated in fields such as machine translation and automatic speech recognition. One major challenge of attention-based models is the need of access to the full sequence and the quadratically growing computational cost concerning the sequence length. These characteristics pose challenges, especially for low-latency scenarios, where the system is often required to be streaming. In this paper, we build a compact and streaming speech recognition system on top of the end-to-end neural transducer architecture with attention-based modules augmented with convolution. The proposed system equips the end-to-end models with the streaming capability and reduces the large footprint from the streaming attention-based model using augmented memory. On the LibriSpeech dataset, our proposed system achieves word error rates 2.7% on test-clean and 5.8% on test-other, to our best knowledge the lowest among streaming approaches reported so far.

* IEEE Spoken Language Technology Workshop 2021

View paper on

Share this with someone who'll enjoy it:

Title:Streaming Attention-Based Models with Augmented Memory for End-to-End Speech Recognition

Paper and Code