Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Stanley Wu

Safe Domain Randomization via Uncertainty-Aware Out-of-Distribution Detection and Policy Adaptation

Jul 08, 2025

Mohamad H. Danesh, Maxime Wabartha, Stanley Wu, Joelle Pineau, Hsiu-Chin Lin

Figure 1 for Safe Domain Randomization via Uncertainty-Aware Out-of-Distribution Detection and Policy Adaptation

Figure 2 for Safe Domain Randomization via Uncertainty-Aware Out-of-Distribution Detection and Policy Adaptation

Figure 3 for Safe Domain Randomization via Uncertainty-Aware Out-of-Distribution Detection and Policy Adaptation

Figure 4 for Safe Domain Randomization via Uncertainty-Aware Out-of-Distribution Detection and Policy Adaptation

Abstract:Deploying reinforcement learning (RL) policies in real-world involves significant challenges, including distribution shifts, safety concerns, and the impracticality of direct interactions during policy refinement. Existing methods, such as domain randomization (DR) and off-dynamics RL, enhance policy robustness by direct interaction with the target domain, an inherently unsafe practice. We propose Uncertainty-Aware RL (UARL), a novel framework that prioritizes safety during training by addressing Out-Of-Distribution (OOD) detection and policy adaptation without requiring direct interactions in target domain. UARL employs an ensemble of critics to quantify policy uncertainty and incorporates progressive environmental randomization to prepare the policy for diverse real-world conditions. By iteratively refining over high-uncertainty regions of the state space in simulated environments, UARL enhances robust generalization to the target domain without explicitly training on it. We evaluate UARL on MuJoCo benchmarks and a quadrupedal robot, demonstrating its effectiveness in reliable OOD detection, improved performance, and enhanced sample efficiency compared to baselines.

Via

Access Paper or Ask Questions

Disrupting Style Mimicry Attacks on Video Imagery

May 11, 2024

Josephine Passananti, Stanley Wu, Shawn Shan, Haitao Zheng, Ben Y. Zhao

Figure 1 for Disrupting Style Mimicry Attacks on Video Imagery

Figure 2 for Disrupting Style Mimicry Attacks on Video Imagery

Figure 3 for Disrupting Style Mimicry Attacks on Video Imagery

Figure 4 for Disrupting Style Mimicry Attacks on Video Imagery

Abstract:Generative AI models are often used to perform mimicry attacks, where a pretrained model is fine-tuned on a small sample of images to learn to mimic a specific artist of interest. While researchers have introduced multiple anti-mimicry protection tools (Mist, Glaze, Anti-Dreambooth), recent evidence points to a growing trend of mimicry models using videos as sources of training data. This paper presents our experiences exploring techniques to disrupt style mimicry on video imagery. We first validate that mimicry attacks can succeed by training on individual frames extracted from videos. We show that while anti-mimicry tools can offer protection when applied to individual frames, this approach is vulnerable to an adaptive countermeasure that removes protection by exploiting randomness in optimization results of consecutive (nearly-identical) frames. We develop a new, tool-agnostic framework that segments videos into short scenes based on frame-level similarity, and use a per-scene optimization baseline to remove inter-frame randomization while reducing computational cost. We show via both image level metrics and an end-to-end user study that the resulting protection restores protection against mimicry (including the countermeasure). Finally, we develop another adaptive countermeasure and find that it falls short against our framework.

Via

Access Paper or Ask Questions

TMI! Finetuned Models Leak Private Information from their Pretraining Data

Jun 01, 2023

John Abascal, Stanley Wu, Alina Oprea, Jonathan Ullman

Abstract:Transfer learning has become an increasingly popular technique in machine learning as a way to leverage a pretrained model trained for one task to assist with building a finetuned model for a related task. This paradigm has been especially popular for privacy in machine learning, where the pretrained model is considered public, and only the data for finetuning is considered sensitive. However, there are reasons to believe that the data used for pretraining is still sensitive, making it essential to understand how much information the finetuned model leaks about the pretraining data. In this work we propose a new membership-inference threat model where the adversary only has access to the finetuned model and would like to infer the membership of the pretraining data. To realize this threat model, we implement a novel metaclassifier-based attack, TMI, that leverages the influence of memorized pretraining samples on predictions in the downstream task. We evaluate TMI on both vision and natural language tasks across multiple transfer learning settings, including finetuning with differential privacy. Through our evaluation, we find that TMI can successfully infer membership of pretraining examples using query access to the finetuned model.

Via

Access Paper or Ask Questions

How to Combine Membership-Inference Attacks on Multiple Updated Models

May 12, 2022

Matthew Jagielski, Stanley Wu, Alina Oprea, Jonathan Ullman, Roxana Geambasu

Figure 1 for How to Combine Membership-Inference Attacks on Multiple Updated Models

Figure 2 for How to Combine Membership-Inference Attacks on Multiple Updated Models

Figure 3 for How to Combine Membership-Inference Attacks on Multiple Updated Models

Figure 4 for How to Combine Membership-Inference Attacks on Multiple Updated Models

Abstract:A large body of research has shown that machine learning models are vulnerable to membership inference (MI) attacks that violate the privacy of the participants in the training data. Most MI research focuses on the case of a single standalone model, while production machine-learning platforms often update models over time, on data that often shifts in distribution, giving the attacker more information. This paper proposes new attacks that take advantage of one or more model updates to improve MI. A key part of our approach is to leverage rich information from standalone MI attacks mounted separately against the original and updated models, and to combine this information in specific ways to improve attack effectiveness. We propose a set of combination functions and tuning methods for each, and present both analytical and quantitative justification for various options. Our results on four public datasets show that our attacks are effective at using update information to give the adversary a significant advantage over attacks on standalone models, but also compared to a prior MI attack that takes advantage of model updates in a related machine-unlearning setting. We perform the first measurements of the impact of distribution shift on MI attacks with model updates, and show that a more drastic distribution shift results in significantly higher MI risk than a gradual shift. Our code is available at https://www.github.com/stanleykywu/model-updates.

* 31 pages, 9 figures

Via

Access Paper or Ask Questions