Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Márius Šajgalík

Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction

Oct 20, 2020

Daria Soboleva, Ondrej Skopek, Márius Šajgalík, Victor Cărbune, Felix Weissenberger, Julia Proskurnia, Bogdan Prisacari, Daniel Valcarce, Justin Lu, Rohit Prabhavalkar(+1 more)

Figure 1 for Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction

Figure 2 for Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction

Figure 3 for Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction

Figure 4 for Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction

Abstract:We present a novel multi-modal unspoken punctuation prediction system for the English language which combines acoustic and text features. We demonstrate for the first time, that by relying exclusively on synthetic data generated using a prosody-aware text-to-speech system, we can outperform a model trained with expensive human audio recordings on the unspoken punctuation prediction problem. Our model architecture is well suited for on-device use. This is achieved by leveraging hash-based embeddings of automatic speech recognition text output in conjunction with acoustic features as input to a quasi-recurrent neural network, keeping the model size small and latency low.

Via

Access Paper or Ask Questions