Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse

Feb 18, 2024

Wanli Yang, Fei Sun, Xinyu Ma, Xun Liu, Dawei Yin, Xueqi Cheng

Figure 1 for The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse

Figure 2 for The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse

Figure 3 for The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse

Figure 4 for The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse

Share this with someone who'll enjoy it:

Abstract:Although model editing has shown promise in revising knowledge in Large Language Models (LLMs), its impact on the inherent capabilities of LLMs is often overlooked. In this work, we reveal a critical phenomenon: even a single edit can trigger model collapse, manifesting as significant performance degradation in various benchmark tasks. However, benchmarking LLMs after each edit, while necessary to prevent such collapses, is impractically time-consuming and resource-intensive. To mitigate this, we propose using perplexity as a surrogate metric, validated by extensive experiments demonstrating its strong correlation with downstream tasks performance. We further conduct an in-depth study on sequential editing, a practical setting for real-world scenarios, across various editing methods and LLMs, focusing on hard cases from our previous single edit studies. The results indicate that nearly all examined editing methods result in model collapse after only few edits. To facilitate further research, we have utilized GPT-3.5 to develop a new dataset, HardEdit, based on those hard cases. This dataset aims to establish the foundation for pioneering research in reliable model editing and the mechanisms underlying editing-induced model collapse. We hope this work can draw the community's attention to the potential risks inherent in model editing practices.

View paper on

Share this with someone who'll enjoy it:

Title:The Butterfly Effect of Model Editing: Few Edits Can Trigger Large Language Models Collapse

Paper and Code