Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pablo Martínez Olmos

Regularizing Transformers With Deep Probabilistic Layers

Aug 23, 2021

Aurora Cobo Aguilera, Pablo Martínez Olmos, Antonio Artés-Rodríguez, Fernando Pérez-Cruz

Figure 1 for Regularizing Transformers With Deep Probabilistic Layers

Figure 2 for Regularizing Transformers With Deep Probabilistic Layers

Figure 3 for Regularizing Transformers With Deep Probabilistic Layers

Figure 4 for Regularizing Transformers With Deep Probabilistic Layers

Abstract:Language models (LM) have grown with non-stop in the last decade, from sequence-to-sequence architectures to the state-of-the-art and utter attention-based Transformers. In this work, we demonstrate how the inclusion of deep generative models within BERT can bring more versatile models, able to impute missing/noisy words with richer text or even improve BLEU score. More precisely, we use a Gaussian Mixture Variational Autoencoder (GMVAE) as a regularizer layer and prove its effectiveness not only in Transformers but also in the most relevant encoder-decoder based LM, seq2seq with and without attention.

Via

Access Paper or Ask Questions

Robust Sampling in Deep Learning

Jun 05, 2020

Aurora Cobo Aguilera, Antonio Artés-Rodríguez, Fernando Pérez-Cruz, Pablo Martínez Olmos

Figure 1 for Robust Sampling in Deep Learning

Figure 2 for Robust Sampling in Deep Learning

Figure 3 for Robust Sampling in Deep Learning

Figure 4 for Robust Sampling in Deep Learning

Abstract:Deep learning requires regularization mechanisms to reduce overfitting and improve generalization. We address this problem by a new regularization method based on distributional robust optimization. The key idea is to modify the contribution from each sample for tightening the empirical risk bound. During the stochastic training, the selection of samples is done according to their accuracy in such a way that the worst performed samples are the ones that contribute the most in the optimization. We study different scenarios and show the ones where it can make the convergence faster or increase the accuracy.

* 8 pages, 3 figures

Via

Access Paper or Ask Questions