Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models

Feb 27, 2025

Huazheng Wang, Yongcheng Jing, Haifeng Sun, Yingjie Wang, Jingyu Wang, Jianxin Liao, Dacheng Tao

Figure 1 for Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models

Figure 2 for Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models

Figure 3 for Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models

Figure 4 for Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models

Share this with someone who'll enjoy it:

Abstract:In this paper, we explore machine unlearning from a novel dimension, by studying how to safeguard model unlearning in large language models (LLMs). Our goal is to prevent unlearned models from recalling any related memory of the targeted knowledge.We begin by uncovering a surprisingly simple yet overlooked fact: existing methods typically erase only the exact expressions of the targeted knowledge, leaving paraphrased or related information intact. To rigorously measure such oversights, we introduce UGBench, the first benchmark tailored for evaluating the generalisation performance across 13 state-of-the-art methods.UGBench reveals that unlearned models can still recall paraphrased answers and retain target facts in intermediate layers. To address this, we propose PERMU, a perturbation-based method that significantly enhances the generalisation capabilities for safeguarding LLM unlearning.Experiments demonstrate that PERMU delivers up to a 50.13% improvement in unlearning while maintaining a 43.53% boost in robust generalisation. Our code can be found in https://github.com/MaybeLizzy/UGBench.

View paper on

Share this with someone who'll enjoy it:

Title:Erasing Without Remembering: Safeguarding Knowledge Forgetting in Large Language Models

Paper and Code