Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:What do Compressed Large Language Models Forget? Robustness Challenges in Model Compression

Oct 16, 2021

Mengnan Du, Subhabrata Mukherjee, Yu Cheng, Milad Shokouhi, Xia Hu, Ahmed Hassan Awadallah

Figure 1 for What do Compressed Large Language Models Forget? Robustness Challenges in Model Compression

Figure 2 for What do Compressed Large Language Models Forget? Robustness Challenges in Model Compression

Figure 3 for What do Compressed Large Language Models Forget? Robustness Challenges in Model Compression

Figure 4 for What do Compressed Large Language Models Forget? Robustness Challenges in Model Compression

Share this with someone who'll enjoy it:

Abstract:Recent works have focused on compressing pre-trained language models (PLMs) like BERT where the major focus has been to improve the compressed model performance for downstream tasks. However, there has been no study in analyzing the impact of compression on the generalizability and robustness of these models. Towards this end, we study two popular model compression techniques including knowledge distillation and pruning and show that compressed models are significantly less robust than their PLM counterparts on adversarial test sets although they obtain similar performance on in-distribution development sets for a task. Further analysis indicates that the compressed models overfit on the easy samples and generalize poorly on the hard ones. We further leverage this observation to develop a regularization strategy for model compression based on sample uncertainty. Experimental results on several natural language understanding tasks demonstrate our mitigation framework to improve both the adversarial generalization as well as in-distribution task performance of the compressed models.

View paper on

Share this with someone who'll enjoy it:

Title:What do Compressed Large Language Models Forget? Robustness Challenges in Model Compression

Paper and Code