Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Olivier St-Georges

Conditioning Trick for Training Stable GANs

Oct 12, 2020

Mohammad Esmaeilpour, Raymel Alfonso Sallo, Olivier St-Georges, Patrick Cardinal, Alessandro Lameiras Koerich

Figure 1 for Conditioning Trick for Training Stable GANs

Figure 2 for Conditioning Trick for Training Stable GANs

Figure 3 for Conditioning Trick for Training Stable GANs

Figure 4 for Conditioning Trick for Training Stable GANs

Abstract:In this paper we propose a conditioning trick, called difference departure from normality, applied on the generator network in response to instability issues during GAN training. We force the generator to get closer to the departure from normality function of real samples computed in the spectral domain of Schur decomposition. This binding makes the generator amenable to truncation and does not limit exploring all the possible modes. We slightly modify the BigGAN architecture incorporating residual network for synthesizing 2D representations of audio signals which enables reconstructing high quality sounds with some preserved phase information. Additionally, the proposed conditional training scenario makes a trade-off between fidelity and variety for the generated spectrograms. The experimental results on UrbanSound8k and ESC-50 environmental sound datasets and the Mozilla common voice dataset have shown that the proposed GAN configuration with the conditioning trick remarkably outperforms baseline architectures, according to three objective metrics: inception score, Frechet inception distance, and signal-to-noise ratio.

Via

Access Paper or Ask Questions

Improving Stability of LS-GANs for Audio and Speech Signals

Aug 12, 2020

Mohammad Esmaeilpour, Raymel Alfonso Sallo, Olivier St-Georges, Patrick Cardinal, Alessandro Lameiras Koerich

Figure 1 for Improving Stability of LS-GANs for Audio and Speech Signals

Figure 2 for Improving Stability of LS-GANs for Audio and Speech Signals

Figure 3 for Improving Stability of LS-GANs for Audio and Speech Signals

Figure 4 for Improving Stability of LS-GANs for Audio and Speech Signals

Abstract:In this paper we address the instability issue of generative adversarial network (GAN) by proposing a new similarity metric in unitary space of Schur decomposition for 2D representations of audio and speech signals. We show that encoding departure from normality computed in this vector space into the generator optimization formulation helps to craft more comprehensive spectrograms. We demonstrate the effectiveness of binding this metric for enhancing stability in training with less mode collapse compared to baseline GANs. Experimental results on subsets of UrbanSound8k and Mozilla common voice datasets have shown considerable improvements on the quality of the generated samples measured by the Fr\'echet inception distance. Moreover, reconstructed signals from these samples, have achieved higher signal to noise ratio compared to regular LS-GANs.

* 10 pages

Via

Access Paper or Ask Questions