Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Raymel Alfonso Sallo

Conditioning Trick for Training Stable GANs

Oct 12, 2020

Mohammad Esmaeilpour, Raymel Alfonso Sallo, Olivier St-Georges, Patrick Cardinal, Alessandro Lameiras Koerich

Figure 1 for Conditioning Trick for Training Stable GANs

Figure 2 for Conditioning Trick for Training Stable GANs

Figure 3 for Conditioning Trick for Training Stable GANs

Figure 4 for Conditioning Trick for Training Stable GANs

Abstract:In this paper we propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training. We force the generator to get closer to the departure from normality function of real samples computed in the spectral domain of Schur decomposition. This binding makes the generator amenable to truncation and does not limit exploring all the possible modes. We slightly modify the BigGAN architecture incorporating residual network for synthesizing 2D representations of audio signals which enables reconstructing high quality sounds with some preserved phase information. Additionally, the proposed conditional training scenario makes a trade-off between fidelity and variety for the generated spectrograms. The experimental results on UrbanSound8k and ESC-50 environmental sound datasets and the Mozilla common voice dataset have shown that the proposed GAN configuration with the conditioning trick remarkably outperforms baseline architectures, according to three objective metrics: inception score, Frechet inception distance, and signal-to-noise ratio.

Via

Access Paper or Ask Questions

Adversarially Training for Audio Classifiers

Aug 26, 2020

Raymel Alfonso Sallo, Mohammad Esmaeilpour, Patrick Cardinal

Figure 1 for Adversarially Training for Audio Classifiers

Figure 2 for Adversarially Training for Audio Classifiers

Figure 3 for Adversarially Training for Audio Classifiers

Figure 4 for Adversarially Training for Audio Classifiers

Abstract:In this paper, we investigate the potential effect of the adversarially training on the robustness of six advanced deep neural networks against a variety of targeted and non-targeted adversarial attacks. We firstly show that, the ResNet-56 model trained on the 2D representation of the discrete wavelet transform appended with the tonnetz chromagram outperforms other models in terms of recognition accuracy. Then we demonstrate the positive impact of adversarially training on this model as well as other deep architectures against six types of attack algorithms (white and black-box) with the cost of the reduced recognition accuracy and limited adversarial perturbation. We run our experiments on two benchmarking environmental sound datasets and show that without any imposed limitations on the budget allocations for the adversary, the fooling rate of the adversarially trained models can exceed 90\%. In other words, adversarial attacks exist in any scales, but they might require higher adversarial perturbations compared to non-adversarially trained models.

* 8 Pages

Via

Access Paper or Ask Questions

Improving Stability of LS-GANs for Audio and Speech Signals

Aug 12, 2020

Mohammad Esmaeilpour, Raymel Alfonso Sallo, Olivier St-Georges, Patrick Cardinal, Alessandro Lameiras Koerich

Figure 1 for Improving Stability of LS-GANs for Audio and Speech Signals

Figure 2 for Improving Stability of LS-GANs for Audio and Speech Signals

Figure 3 for Improving Stability of LS-GANs for Audio and Speech Signals

Figure 4 for Improving Stability of LS-GANs for Audio and Speech Signals

Abstract:In this paper we address the instability issue of generative adversarial network (GAN) by proposing a new similarity metric in unitary space of Schur decomposition for 2D representations of audio and speech signals. We show that encoding departure from normality computed in this vector space into the generator optimization formulation helps to craft more comprehensive spectrograms. We demonstrate the effectiveness of binding this metric for enhancing stability in training with less mode collapse compared to baseline GANs. Experimental results on subsets of UrbanSound8k and Mozilla common voice datasets have shown considerable improvements on the quality of the generated samples measured by the Fr\'echet inception distance. Moreover, reconstructed signals from these samples, have achieved higher signal to noise ratio compared to regular LS-GANs.

* 10 pages

Via

Access Paper or Ask Questions