Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yong Zou

U2-KWS: Unified Two-pass Open-vocabulary Keyword Spotting with Keyword Bias

Dec 15, 2023

Ao Zhang, Pan Zhou, Kaixun Huang, Yong Zou, Ming Liu, Lei Xie

Abstract:Open-vocabulary keyword spotting (KWS), which allows users to customize keywords, has attracted increasingly more interest. However, existing methods based on acoustic models and post-processing train the acoustic model with ASR training criteria to model all phonemes, making the acoustic model under-optimized for the KWS task. To solve this problem, we propose a novel unified two-pass open-vocabulary KWS (U2-KWS) framework inspired by the two-pass ASR model U2. Specifically, we employ the CTC branch as the first stage model to detect potential keyword candidates and the decoder branch as the second stage model to validate candidates. In order to enhance any customized keywords, we redesign the U2 training procedure for U2-KWS and add keyword information by audio and text cross-attention into both branches. We perform experiments on our internal dataset and Aishell-1. The results show that U2-KWS can achieve a significant relative wake-up rate improvement of 41% compared to the traditional customized KWS systems when the false alarm rate is fixed to 0.5 times per hour.

* Accepted by ASRU2023

Via

Access Paper or Ask Questions

SpikeSEE: An Energy-Efficient Dynamic Scenes Processing Framework for Retinal Prostheses

Sep 16, 2022

Chuanqing Wang, Chaoming Fang, Yong Zou, Jie Yang, Mohamad Sawan

Figure 1 for SpikeSEE: An Energy-Efficient Dynamic Scenes Processing Framework for Retinal Prostheses

Figure 2 for SpikeSEE: An Energy-Efficient Dynamic Scenes Processing Framework for Retinal Prostheses

Figure 3 for SpikeSEE: An Energy-Efficient Dynamic Scenes Processing Framework for Retinal Prostheses

Figure 4 for SpikeSEE: An Energy-Efficient Dynamic Scenes Processing Framework for Retinal Prostheses

Abstract:Intelligent and low-power retinal prostheses are highly demanded in this era, where wearable and implantable devices are used for numerous healthcare applications. In this paper, we propose an energy-efficient dynamic scenes processing framework (SpikeSEE) that combines a spike representation encoding technique and a bio-inspired spiking recurrent neural network (SRNN) model to achieve intelligent processing and extreme low-power computation for retinal prostheses. The spike representation encoding technique could interpret dynamic scenes with sparse spike trains, decreasing the data volume. The SRNN model, inspired by the human retina special structure and spike processing method, is adopted to predict the response of ganglion cells to dynamic scenes. Experimental results show that the Pearson correlation coefficient of the proposed SRNN model achieves 0.93, which outperforms the state of the art processing framework for retinal prostheses. Thanks to the spike representation and SRNN processing, the model can extract visual features in a multiplication-free fashion. The framework achieves 12 times power reduction compared with the convolutional recurrent neural network (CRNN) processing-based framework. Our proposed SpikeSEE predicts the response of ganglion cells more accurately with lower energy consumption, which alleviates the precision and power issues of retinal prostheses and provides a potential solution for wearable or implantable prostheses.

Via

Access Paper or Ask Questions