Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Monotonic segmental attention for automatic speech recognition

Oct 26, 2022

Albert Zeyer, Robin Schmitt, Wei Zhou, Ralf Schlüter, Hermann Ney

Share this with someone who'll enjoy it:

Abstract:We introduce a novel segmental-attention model for automatic speech recognition. We restrict the decoder attention to segments to avoid quadratic runtime of global attention, better generalize to long sequences, and eventually enable streaming. We directly compare global-attention and different segmental-attention modeling variants. We develop and compare two separate time-synchronous decoders, one specifically taking the segmental nature into account, yielding further improvements. Using time-synchronous decoding for segmental models is novel and a step towards streaming applications. Our experiments show the importance of a length model to predict the segment boundaries. The final best segmental-attention model using segmental decoding performs better than global-attention, in contrast to other monotonic attention approaches in the literature. Further, we observe that the segmental model generalizes much better to long sequences of up to several minutes.

* accepted at SLT: https://slt2022.org/

View paper on

Share this with someone who'll enjoy it:

Title:Monotonic segmental attention for automatic speech recognition

Paper and Code