Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement

Mar 31, 2020

Chao-Han Huck Yang, Jun Qi, Pin-Yu Chen, Xiaoli Ma, Chin-Hui Lee

Figure 1 for Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement

Figure 2 for Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement

Figure 3 for Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement

Figure 4 for Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement

Share this with someone who'll enjoy it:

Abstract:Recent studies have highlighted adversarial examples as ubiquitous threats to the deep neural network (DNN) based speech recognition systems. In this work, we present a U-Net based attention model, U-Net$_{At}$, to enhance adversarial speech signals. Specifically, we evaluate the model performance by interpretable speech recognition metrics and discuss the model performance by the augmented adversarial training. Our experiments show that our proposed U-Net$_{At}$ improves the perceptual evaluation of speech quality (PESQ) from 1.13 to 2.78, speech transmission index (STI) from 0.65 to 0.75, short-term objective intelligibility (STOI) from 0.83 to 0.96 on the task of speech enhancement with adversarial speech examples. We conduct experiments on the automatic speech recognition (ASR) task with adversarial audio attacks. We find that (i) temporal features learned by the attention network are capable of enhancing the robustness of DNN based ASR models; (ii) the generalization power of DNN based ASR model could be enhanced by applying adversarial training with an additive adversarial data augmentation. The ASR metric on word-error-rates (WERs) shows that there is an absolute 2.22 $\%$ decrease under gradient-based perturbation, and an absolute 2.03 $\%$ decrease, under evolutionary-optimized perturbation, which suggests that our enhancement models with adversarial training can further secure a resilient ASR system.

* The first draft was finished in August 2019. Accepted to IEEE ICASSP 2020

View paper on

Share this with someone who'll enjoy it:

Title:Characterizing Speech Adversarial Examples Using Self-Attention U-Net Enhancement

Paper and Code