Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Learning to Sample Replacements for ELECTRA Pre-Training

Jun 25, 2021

Yaru Hao, Li Dong, Hangbo Bao, Ke Xu, Furu Wei

Figure 1 for Learning to Sample Replacements for ELECTRA Pre-Training

Figure 2 for Learning to Sample Replacements for ELECTRA Pre-Training

Figure 3 for Learning to Sample Replacements for ELECTRA Pre-Training

Figure 4 for Learning to Sample Replacements for ELECTRA Pre-Training

Share this with someone who'll enjoy it:

Abstract:ELECTRA pretrains a discriminator to detect replaced tokens, where the replacements are sampled from a generator trained with masked language modeling. Despite the compelling performance, ELECTRA suffers from the following two issues. First, there is no direct feedback loop from discriminator to generator, which renders replacement sampling inefficient. Second, the generator's prediction tends to be over-confident along with training, making replacements biased to correct tokens. In this paper, we propose two methods to improve replacement sampling for ELECTRA pre-training. Specifically, we augment sampling with a hardness prediction mechanism, so that the generator can encourage the discriminator to learn what it has not acquired. We also prove that efficient sampling reduces the training variance of the discriminator. Moreover, we propose to use a focal loss for the generator in order to relieve oversampling of correct tokens as replacements. Experimental results show that our method improves ELECTRA pre-training on various downstream tasks.

* Accepted by Findings of ACL 2021

View paper on

Share this with someone who'll enjoy it:

Title:Learning to Sample Replacements for ELECTRA Pre-Training

Paper and Code