Abstract:Model editing has emerged as a practical approach for mitigating factual errors and outdated knowledge in large language models (LLMs). Among existing methods, the Locate-and-Edit (L&E) paradigm is the dominant framework: it locates MLP parameters implicated in expressing a target fact, and then performs a localized update to rewrite that fact. However, long sequences of edits often trigger abrupt model collapse in L&E beyond a critical point. We empirically identify a strong correlation between collapse and explosive growth of edited MLP weight norms, and formally prove that commonly used L&E update rules can induce exponential norm growth across sequential edits in the absence of explicit norm control. To address this issue, we propose Norm-Anchor Scaling NAS, a plug-and-play norm-constrained strategy. Across extensive experiments, NAS delays the collapse point of representative L&E algorithms by more than 4 times and yields a 72.2% average relative gain in editing performance, requiring only a single additional line of code and incurring negligible computational overhead.
Abstract:In this paper, we derive a Bayesian model order selection rule by using the exponentially embedded family method, termed Bayesian EEF. Unlike many other Bayesian model selection methods, the Bayesian EEF can use vague proper priors and improper noninformative priors to be objective in the elicitation of parameter priors. Moreover, the penalty term of the rule is shown to be the sum of half of the parameter dimension and the estimated mutual information between parameter and observed data. This helps to reveal the EEF mechanism in selecting model orders and may provide new insights into the open problems of choosing an optimal penalty term for model order selection and choosing a good prior from information theoretic viewpoints. The important example of linear model order selection is given to illustrate the algorithms and arguments. Lastly, the Bayesian EEF that uses Jeffreys prior coincides with the EEF rule derived by frequentist strategies. This shows another interesting relationship between the frequentist and Bayesian philosophies for model selection.