Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dillon Knox

A neural prosody encoder for end-ro-end dialogue act classification

May 11, 2022

Kai Wei, Dillon Knox, Martin Radfar, Thanh Tran, Markus Muller, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris, Maurizio Omologo

Figure 1 for A neural prosody encoder for end-ro-end dialogue act classification

Figure 2 for A neural prosody encoder for end-ro-end dialogue act classification

Figure 3 for A neural prosody encoder for end-ro-end dialogue act classification

Figure 4 for A neural prosody encoder for end-ro-end dialogue act classification

Abstract:Dialogue act classification (DAC) is a critical task for spoken language understanding in dialogue systems. Prosodic features such as energy and pitch have been shown to be useful for DAC. Despite their importance, little research has explored neural approaches to integrate prosodic features into end-to-end (E2E) DAC models which infer dialogue acts directly from audio signals. In this work, we propose an E2E neural architecture that takes into account the need for characterizing prosodic phenomena co-occurring at different levels inside an utterance. A novel part of this architecture is a learnable gating mechanism that assesses the importance of prosodic features and selectively retains core information necessary for E2E DAC. Our proposed model improves DAC accuracy by 1.07% absolute across three publicly available benchmark datasets.

Via

Access Paper or Ask Questions

Perceptual-based deep-learning denoiser as a defense against adversarial attacks on ASR systems

Jul 12, 2021

Anirudh Sreeram, Nicholas Mehlman, Raghuveer Peri, Dillon Knox, Shrikanth Narayanan

Figure 1 for Perceptual-based deep-learning denoiser as a defense against adversarial attacks on ASR systems

Figure 2 for Perceptual-based deep-learning denoiser as a defense against adversarial attacks on ASR systems

Figure 3 for Perceptual-based deep-learning denoiser as a defense against adversarial attacks on ASR systems

Figure 4 for Perceptual-based deep-learning denoiser as a defense against adversarial attacks on ASR systems

Abstract:In this paper we investigate speech denoising as a defense against adversarial attacks on automatic speech recognition (ASR) systems. Adversarial attacks attempt to force misclassification by adding small perturbations to the original speech signal. We propose to counteract this by employing a neural-network based denoiser as a pre-processor in the ASR pipeline. The denoiser is independent of the downstream ASR model, and thus can be rapidly deployed in existing systems. We found that training the denoisier using a perceptually motivated loss function resulted in increased adversarial robustness without compromising ASR performance on benign samples. Our defense was evaluated (as a part of the DARPA GARD program) on the 'Kenansville' attack strategy across a range of attack strengths and speech samples. An average improvement in Word Error Rate (WER) of about 7.7% was observed over the undefended model at 20 dB signal-to-noise-ratio (SNR) attack strength.

* 5 pages, 4 figures submitted to ASRU 2021

Via

Access Paper or Ask Questions