Abstract:This paper introduces, to the best of the authors' knowledge, the first fine-grained temporal sparsity-aware keyword spotting (KWS) IC leveraging temporal similarities between neighboring feature vectors extracted from input frames and network hidden states, eliminating unnecessary operations and memory accesses. This KWS IC, featuring a bio-inspired delta-gated recurrent neural network ({\Delta}RNN) classifier, achieves an 11-class Google Speech Command Dataset (GSCD) KWS accuracy of 90.5% and energy consumption of 36nJ/decision. At 87% temporal sparsity, computing latency and energy per inference are reduced by 2.4$\times$/3.4$\times$, respectively. The 65nm design occupies 0.78mm$^2$ and features two additional blocks, a compact 0.084mm$^2$ digital infinite-impulse-response (IIR)-based band-pass filter (BPF) audio feature extractor (FEx) and a 24kB 0.6V near-Vth weight SRAM with 6.6$\times$ lower read power compared to the standard SRAM.
Abstract:This article presents the first keyword spotting (KWS) IC which uses a ring-oscillator-based time-domain processing technique for its analog feature extractor (FEx). Its extensive usage of time-encoding schemes allows the analog audio signal to be processed in a fully time-domain manner except for the voltage-to-time conversion stage of the analog front-end. Benefiting from fundamental building blocks based on digital logic gates, it offers a better technology scalability compared to conventional voltage-domain designs. Fabricated in a 65 nm CMOS process, the prototyped KWS IC occupies 2.03mm$^{2}$ and dissipates 23 $\mu$W power consumption including analog FEx and digital neural network classifier. The 16-channel time-domain FEx achieves 54.89 dB dynamic range for 16 ms frame shift size while consuming 9.3 $\mu$W. The measurement result verifies that the proposed IC performs a 12-class KWS task on the Google Speech Command Dataset (GSCD) with >86% accuracy and 12.4 ms latency.
Abstract:Silicon cochlea designs capture the functionality of the biological cochlea. Their use has been explored for cochlea prosthesis applications and more recently in edge audio devices which are required to support always-on operation. As their stringent power constraints pose several design challenges, IC designers are forced to look for solutions that use low standby power. One promising bio-inspired approach is to combine the continuous-time analog filter channels of the silicon cochlea with a small memory footprint deep neural network that is trained on edge tasks such as keyword spotting, thereby allowing all blocks to be embedded in an IC. This paper reviews the analog filter circuits used as feature extractors for current edge audio devices, starting with the original biquad filter circuits proposed for the silicon cochlea. Our analysis starts from the interpretation of a basic biquad filter as a two-integrator-loop topology and reviews the progression in the design of second-order low-pass and band-pass filters ranging from OTA-based to source-follower-based architectures. We also derive and analyze the small-signal transfer function and discuss performance aspects of these filters. The analysis of these different filter configurations can be applied to other application domains such as biomedical devices which employ a front-end bandpass filter.