Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anish Shah

Towards Data-efficient Modeling for Wake Word Spotting

Oct 13, 2020

Yixin Gao, Yuriy Mishchenko, Anish Shah, Spyros Matsoukas, Shiv Vitaladevuni

Figure 1 for Towards Data-efficient Modeling for Wake Word Spotting

Figure 2 for Towards Data-efficient Modeling for Wake Word Spotting

Figure 3 for Towards Data-efficient Modeling for Wake Word Spotting

Figure 4 for Towards Data-efficient Modeling for Wake Word Spotting

Abstract:Wake word (WW) spotting is challenging in far-field not only because of the interference in signal transmission but also the complexity in acoustic environments. Traditional WW model training requires large amount of in-domain WW-specific data with substantial human annotations therefore it is hard to build WW models without such data. In this paper we present data-efficient solutions to address the challenges in WW modeling, such as domain-mismatch, noisy conditions, limited annotation, etc. Our proposed system is composed of a multi-condition training pipeline with a stratified data augmentation, which improves the model robustness to a variety of predefined acoustic conditions, together with a semi-supervised learning pipeline to accurately extract the WW and confusable examples from untranscribed speech corpus. Starting from only 10 hours of domain-mismatched WW audio, we are able to enlarge and enrich the training dataset by 20-100 times to capture the acoustic complexity. Our experiments on real user data show that the proposed solutions can achieve comparable performance of a production-grade model by saving 97\% of the amount of WW-specific data collection and 86\% of the bandwidth for annotation.

* Proc. ICASSP 2020

Via

Access Paper or Ask Questions

Accurate Detection of Wake Word Start and End Using a CNN

Aug 09, 2020

Christin Jose, Yuriy Mishchenko, Thibaud Senechal, Anish Shah, Alex Escott, Shiv Vitaladevuni

Figure 1 for Accurate Detection of Wake Word Start and End Using a CNN

Figure 2 for Accurate Detection of Wake Word Start and End Using a CNN

Figure 3 for Accurate Detection of Wake Word Start and End Using a CNN

Figure 4 for Accurate Detection of Wake Word Start and End Using a CNN

Abstract:Small footprint embedded devices require keyword spotters (KWS) with small model size and detection latency for enabling voice assistants. Such a keyword is often referred to as \textit{wake word} as it is used to wake up voice assistant enabled devices. Together with wake word detection, accurate estimation of wake word endpoints (start and end) is an important task of KWS. In this paper, we propose two new methods for detecting the endpoints of wake words in neural KWS that use single-stage word-level neural networks. Our results show that the new techniques give superior accuracy for detecting wake words' endpoints of up to 50 msec standard error versus human annotations, on par with the conventional Acoustic Model plus HMM forced alignment. To our knowledge, this is the first study of wake word endpoints detection methods for single-stage neural KWS.

* Proceedings of INTERSPEECH

Via

Access Paper or Ask Questions

Deep Residual Networks with Exponential Linear Unit

Oct 05, 2016

Anish Shah, Eashan Kadam, Hena Shah, Sameer Shinde, Sandip Shingade

Figure 1 for Deep Residual Networks with Exponential Linear Unit

Figure 2 for Deep Residual Networks with Exponential Linear Unit

Figure 3 for Deep Residual Networks with Exponential Linear Unit

Figure 4 for Deep Residual Networks with Exponential Linear Unit

Abstract:Very deep convolutional neural networks introduced new problems like vanishing gradient and degradation. The recent successful contributions towards solving these problems are Residual and Highway Networks. These networks introduce skip connections that allow the information (from the input or those learned in earlier layers) to flow more into the deeper layers. These very deep models have lead to a considerable decrease in test errors, on benchmarks like ImageNet and COCO. In this paper, we propose the use of exponential linear unit instead of the combination of ReLU and Batch Normalization in Residual Networks. We show that this not only speeds up learning in Residual Networks but also improves the accuracy as the depth increases. It improves the test error on almost all data sets, like CIFAR-10 and CIFAR-100

* submitted in Vision Net 2016, Jaipur, India

Via

Access Paper or Ask Questions