Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ashia Wilson

UCD: Unlearning in LLMs via Contrastive Decoding

Jun 12, 2025

Vinith M. Suriyakumar, Ayush Sekhari, Ashia Wilson

Abstract:Machine unlearning aims to remove specific information, e.g. sensitive or undesirable content, from large language models (LLMs) while preserving overall performance. We propose an inference-time unlearning algorithm that uses contrastive decoding, leveraging two auxiliary smaller models, one trained without the forget set and one trained with it, to guide the outputs of the original model using their difference during inference. Our strategy substantially improves the tradeoff between unlearning effectiveness and model utility. We evaluate our approach on two unlearning benchmarks, TOFU and MUSE. Results show notable gains in both forget quality and retained performance in comparison to prior approaches, suggesting that incorporating contrastive decoding can offer an efficient, practical avenue for unlearning concepts in large-scale models.

Via

Access Paper or Ask Questions

Layered Unlearning for Adversarial Relearning

May 14, 2025

Timothy Qian, Vinith Suriyakumar, Ashia Wilson, Dylan Hadfield-Menell

Abstract:Our goal is to understand how post-training methods, such as fine-tuning, alignment, and unlearning, modify language model behavior and representations. We are particularly interested in the brittle nature of these modifications that makes them easy to bypass through prompt engineering or relearning. Recent results suggest that post-training induces shallow context-dependent ``circuits'' that suppress specific response patterns. This could be one explanation for the brittleness of post-training. To test this hypothesis, we design an unlearning algorithm, Layered Unlearning (LU), that creates distinct inhibitory mechanisms for a growing subset of the data. By unlearning the first $i$ folds while retaining the remaining $k - i$ at the $i$th of $k$ stages, LU limits the ability of relearning on a subset of data to recover the full dataset. We evaluate LU through a combination of synthetic and large language model (LLM) experiments. We find that LU improves robustness to adversarial relearning for several different unlearning methods. Our results contribute to the state-of-the-art of machine unlearning and provide insight into the effect of post-training updates.

* 37 pages, 8 figures

Via

Access Paper or Ask Questions

High-accuracy sampling from constrained spaces with the Metropolis-adjusted Preconditioned Langevin Algorithm

Dec 24, 2024

Vishwak Srinivasan, Andre Wibisono, Ashia Wilson

Figure 1 for High-accuracy sampling from constrained spaces with the Metropolis-adjusted Preconditioned Langevin Algorithm

Figure 2 for High-accuracy sampling from constrained spaces with the Metropolis-adjusted Preconditioned Langevin Algorithm

Figure 3 for High-accuracy sampling from constrained spaces with the Metropolis-adjusted Preconditioned Langevin Algorithm

Figure 4 for High-accuracy sampling from constrained spaces with the Metropolis-adjusted Preconditioned Langevin Algorithm

Abstract:In this work, we propose a first-order sampling method called the Metropolis-adjusted Preconditioned Langevin Algorithm for approximate sampling from a target distribution whose support is a proper convex subset of $\mathbb{R}^{d}$. Our proposed method is the result of applying a Metropolis-Hastings filter to the Markov chain formed by a single step of the preconditioned Langevin algorithm with a metric $\mathscr{G}$, and is motivated by the natural gradient descent algorithm for optimisation. We derive non-asymptotic upper bounds for the mixing time of this method for sampling from target distributions whose potentials are bounded relative to $\mathscr{G}$, and for exponential distributions restricted to the support. Our analysis suggests that if $\mathscr{G}$ satisfies stronger notions of self-concordance introduced in Kook and Vempala (2024), then these mixing time upper bounds have a strictly better dependence on the dimension than when is merely self-concordant. We also provide numerical experiments that demonstrates the practicality of our proposed method. Our method is a high-accuracy sampler due to the polylogarithmic dependence on the error tolerance in our mixing time upper bounds.

* 55 pages, 5 figures, 2 tables. Shorter version without experiments accepted at ALT 2025

Via

Access Paper or Ask Questions

Faster Machine Unlearning via Natural Gradient Descent

Jul 11, 2024

Omri Lev, Ashia Wilson

Figure 1 for Faster Machine Unlearning via Natural Gradient Descent

Figure 2 for Faster Machine Unlearning via Natural Gradient Descent

Figure 3 for Faster Machine Unlearning via Natural Gradient Descent

Figure 4 for Faster Machine Unlearning via Natural Gradient Descent

Abstract:We address the challenge of efficiently and reliably deleting data from machine learning models trained using Empirical Risk Minimization (ERM), a process known as machine unlearning. To avoid retraining models from scratch, we propose a novel algorithm leveraging Natural Gradient Descent (NGD). Our theoretical framework ensures strong privacy guarantees for convex models, while a practical Min/Max optimization algorithm is developed for non-convex models. Comprehensive evaluations show significant improvements in privacy, computational efficiency, and generalization compared to state-of-the-art methods, advancing both the theoretical and practical aspects of machine unlearning.

Via

Access Paper or Ask Questions

Mean-field underdamped Langevin dynamics and its spacetime discretization

Jan 17, 2024

Qiang Fu, Ashia Wilson

Figure 1 for Mean-field underdamped Langevin dynamics and its spacetime discretization

Figure 2 for Mean-field underdamped Langevin dynamics and its spacetime discretization

Figure 3 for Mean-field underdamped Langevin dynamics and its spacetime discretization

Figure 4 for Mean-field underdamped Langevin dynamics and its spacetime discretization

Abstract:We propose a new method called the N-particle underdamped Langevin algorithm for optimizing a special class of non-linear functionals defined over the space of probability measures. Examples of problems with this formulation include training mean-field neural networks, maximum mean discrepancy minimization and kernel Stein discrepancy minimization. Our algorithm is based on a novel spacetime discretization of the mean-field underdamped Langevin dynamics, for which we provide a new, fast mixing guarantee. In addition, we demonstrate that our algorithm converges globally in total variation distance, bridging the theoretical gap between the dynamics and its practical implementation.

* 40 pages, 5 figures, 2 tables

Via

Access Paper or Ask Questions

Fast sampling from constrained spaces using the Metropolis-adjusted Mirror Langevin Algorithm

Dec 14, 2023

Vishwak Srinivasan, Andre Wibisono, Ashia Wilson

Figure 1 for Fast sampling from constrained spaces using the Metropolis-adjusted Mirror Langevin Algorithm

Figure 2 for Fast sampling from constrained spaces using the Metropolis-adjusted Mirror Langevin Algorithm

Figure 3 for Fast sampling from constrained spaces using the Metropolis-adjusted Mirror Langevin Algorithm

Figure 4 for Fast sampling from constrained spaces using the Metropolis-adjusted Mirror Langevin Algorithm

Abstract:We propose a new method called the Metropolis-adjusted Mirror Langevin algorithm for approximate sampling from distributions whose support is a compact and convex set. This algorithm adds an accept-reject filter to the Markov chain induced by a single step of the mirror Langevin algorithm (Zhang et al., 2020), which is a basic discretisation of the mirror Langevin dynamics. Due to the inclusion of this filter, our method is unbiased relative to the target, while known discretisations of the mirror Langevin dynamics including the mirror Langevin algorithm have an asymptotic bias. We give upper bounds for the mixing time of the proposed algorithm when the potential is relatively smooth, convex, and Lipschitz with respect to a self-concordant mirror function. As a consequence of the reversibility of the Markov chain induced by the algorithm, we obtain an exponentially better dependence on the error tolerance for approximate sampling. We also present numerical experiments that corroborate our theoretical findings.

* 48 pages, 6 figures, 2 tables

Via

Access Paper or Ask Questions

What is a Fair Diffusion Model? Designing Generative Text-To-Image Models to Incorporate Various Worldviews

Sep 18, 2023

Zoe De Simone, Angie Boggust, Arvind Satyanarayan, Ashia Wilson

Abstract:Generative text-to-image (GTI) models produce high-quality images from short textual descriptions and are widely used in academic and creative domains. However, GTI models frequently amplify biases from their training data, often producing prejudiced or stereotypical images. Yet, current bias mitigation strategies are limited and primarily focus on enforcing gender parity across occupations. To enhance GTI bias mitigation, we introduce DiffusionWorldViewer, a tool to analyze and manipulate GTI models' attitudes, values, stories, and expectations of the world that impact its generated images. Through an interactive interface deployed as a web-based GUI and Jupyter Notebook plugin, DiffusionWorldViewer categorizes existing demographics of GTI-generated images and provides interactive methods to align image demographics with user worldviews. In a study with 13 GTI users, we find that DiffusionWorldViewer allows users to represent their varied viewpoints about what GTI outputs are fair and, in doing so, challenges current notions of fairness that assume a universal worldview.

* 20 pages, 5 figures

Via

Access Paper or Ask Questions

Approximate Cross-validation: Guarantees for Model Assessment and Selection

Mar 02, 2020

Ashia Wilson, Maximilian Kasy, Lester Mackey

Figure 1 for Approximate Cross-validation: Guarantees for Model Assessment and Selection

Figure 2 for Approximate Cross-validation: Guarantees for Model Assessment and Selection

Figure 3 for Approximate Cross-validation: Guarantees for Model Assessment and Selection

Abstract:Cross-validation (CV) is a popular approach for assessing and selecting predictive models. However, when the number of folds is large, CV suffers from a need to repeatedly refit a learning procedure on a large number of training datasets. Recent work in empirical risk minimization (ERM) approximates the expensive refitting with a single Newton step warm-started from the full training set optimizer. While this can greatly reduce runtime, several open questions remain including whether these approximations lead to faithful model selection and whether they are suitable for non-smooth objectives. We address these questions with three main contributions: (i) we provide uniform non-asymptotic, deterministic model assessment guarantees for approximate CV; (ii) we show that (roughly) the same conditions also guarantee model selection performance comparable to CV; (iii) we provide a proximal Newton extension of the approximate CV framework for non-smooth prediction problems and develop improved assessment guarantees for problems such as l1-regularized ERM.

* 8 pages, 3 figures

Via

Access Paper or Ask Questions