Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yaqiao Deng

Conformer-Based Speech Recognition On Extreme Edge-Computing Devices

Dec 16, 2023

Mingbin Xu, Alex Jin, Sicheng Wang, Mu Su, Tim Ng, Henry Mason, Shiyi Han, Yaqiao Deng, Zhen Huang, Mahesh Krishnamoorthy

Figure 1 for Conformer-Based Speech Recognition On Extreme Edge-Computing Devices

Figure 2 for Conformer-Based Speech Recognition On Extreme Edge-Computing Devices

Figure 3 for Conformer-Based Speech Recognition On Extreme Edge-Computing Devices

Figure 4 for Conformer-Based Speech Recognition On Extreme Edge-Computing Devices

Abstract:With increasingly more powerful compute capabilities and resources in today's devices, traditionally compute-intensive automatic speech recognition (ASR) has been moving from the cloud to devices to better protect user privacy. However, it is still challenging to implement on-device ASR on resource-constrained devices, such as smartphones, smart wearables, and other small home automation devices. In this paper, we propose a series of model architecture adaptions, neural network graph transformations, and numerical optimizations to fit an advanced Conformer based end-to-end streaming ASR system on resource-constrained devices without accuracy degradation. We achieve over 5.26 times faster than realtime (0.19 RTF) speech recognition on small wearables while minimizing energy consumption and achieving state-of-the-art accuracy. The proposed methods are widely applicable to other transformer-based server-free AI applications. In addition, we provide a complete theory on optimal pre-normalizers that numerically stabilize layer normalization in any Lp-norm using any floating point precision.

Via

Access Paper or Ask Questions

Acoustic Model Fusion for End-to-end Speech Recognition

Oct 10, 2023

Zhihong Lei, Mingbin Xu, Shiyi Han, Leo Liu, Zhen Huang, Tim Ng, Yuanyuan Zhang, Ernest Pusateri, Mirko Hannemann, Yaqiao Deng(+1 more)

Figure 1 for Acoustic Model Fusion for End-to-end Speech Recognition

Figure 2 for Acoustic Model Fusion for End-to-end Speech Recognition

Figure 3 for Acoustic Model Fusion for End-to-end Speech Recognition

Figure 4 for Acoustic Model Fusion for End-to-end Speech Recognition

Abstract:Recent advances in deep learning and automatic speech recognition (ASR) have enabled the end-to-end (E2E) ASR system and boosted the accuracy to a new level. The E2E systems implicitly model all conventional ASR components, such as the acoustic model (AM) and the language model (LM), in a single network trained on audio-text pairs. Despite this simpler system architecture, fusing a separate LM, trained exclusively on text corpora, into the E2E system has proven to be beneficial. However, the application of LM fusion presents certain drawbacks, such as its inability to address the domain mismatch issue inherent to the internal AM. Drawing inspiration from the concept of LM fusion, we propose the integration of an external AM into the E2E system to better address the domain mismatch. By implementing this novel approach, we have achieved a significant reduction in the word error rate, with an impressive drop of up to 14.3% across varied test sets. We also discovered that this AM fusion approach is particularly beneficial in enhancing named entity recognition.

Via

Access Paper or Ask Questions

Training Large-Vocabulary Neural Language Models by Private Federated Learning for Resource-Constrained Devices

Jul 18, 2022

Mingbin Xu, Congzheng Song, Ye Tian, Neha Agrawal, Filip Granqvist, Rogier van Dalen, Xiao Zhang, Arturo Argueta, Shiyi Han, Yaqiao Deng(+3 more)

Figure 1 for Training Large-Vocabulary Neural Language Models by Private Federated Learning for Resource-Constrained Devices

Figure 2 for Training Large-Vocabulary Neural Language Models by Private Federated Learning for Resource-Constrained Devices

Figure 3 for Training Large-Vocabulary Neural Language Models by Private Federated Learning for Resource-Constrained Devices

Figure 4 for Training Large-Vocabulary Neural Language Models by Private Federated Learning for Resource-Constrained Devices

Abstract:Federated Learning (FL) is a technique to train models using data distributed across devices. Differential Privacy (DP) provides a formal privacy guarantee for sensitive data. Our goal is to train a large neural network language model (NNLM) on compute-constrained devices while preserving privacy using FL and DP. However, the DP-noise introduced to the model increases as the model size grows, which often prevents convergence. We propose Partial Embedding Updates (PEU), a novel technique to decrease noise by decreasing payload size. Furthermore, we adopt Low Rank Adaptation (LoRA) and Noise Contrastive Estimation (NCE) to reduce the memory demands of large models on compute-constrained devices. This combination of techniques makes it possible to train large-vocabulary language models while preserving accuracy and privacy.

Via

Access Paper or Ask Questions