Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaohua Jia

Exploring Incremental Unlearning: Techniques, Challenges, and Future Directions

Feb 23, 2025

Sadia Qureshi, Thanveer Shaik, Xiaohui Tao, Haoran Xie, Lin Li, Jianming Yong, Xiaohua Jia

Abstract:The growing demand for data privacy in Machine Learning (ML) applications has seen Machine Unlearning (MU) emerge as a critical area of research. As the `right to be forgotten' becomes regulated globally, it is increasingly important to develop mechanisms that delete user data from AI systems while maintaining performance and scalability of these systems. Incremental Unlearning (IU) is a promising MU solution to address the challenges of efficiently removing specific data from ML models without the need for expensive and time-consuming full retraining. This paper presents the various techniques and approaches to IU. It explores the challenges faced in designing and implementing IU mechanisms. Datasets and metrics for evaluating the performance of unlearning techniques are discussed as well. Finally, potential solutions to the IU challenges alongside future research directions are offered. This survey provides valuable insights for researchers and practitioners seeking to understand the current landscape of IU and its potential for enhancing privacy-preserving intelligent systems.

* This work has been submitted to the IEEE for possible publication

Via

Access Paper or Ask Questions

CAT: Contrastive Adversarial Training for Evaluating the Robustness of Protective Perturbations in Latent Diffusion Models

Feb 11, 2025

Sen Peng, Mingyue Wang, Jianfei He, Jijia Yang, Xiaohua Jia

Abstract:Latent diffusion models have recently demonstrated superior capabilities in many downstream image synthesis tasks. However, customization of latent diffusion models using unauthorized data can severely compromise the privacy and intellectual property rights of data owners. Adversarial examples as protective perturbations have been developed to defend against unauthorized data usage by introducing imperceptible noise to customization samples, preventing diffusion models from effectively learning them. In this paper, we first reveal that the primary reason adversarial examples are effective as protective perturbations in latent diffusion models is the distortion of their latent representations, as demonstrated through qualitative and quantitative experiments. We then propose the Contrastive Adversarial Training (CAT) utilizing adapters as an adaptive attack against these protection methods, highlighting their lack of robustness. Extensive experiments demonstrate that our CAT method significantly reduces the effectiveness of protective perturbations in customization configurations, urging the community to reconsider and enhance the robustness of existing protective perturbation methods. Code is available at \hyperlink{here}{https://github.com/senp98/CAT}.

Via

Access Paper or Ask Questions

Protective Perturbations against Unauthorized Data Usage in Diffusion-based Image Generation

Dec 25, 2024

Sen Peng, Jijia Yang, Mingyue Wang, Jianfei He, Xiaohua Jia

Figure 1 for Protective Perturbations against Unauthorized Data Usage in Diffusion-based Image Generation

Abstract:Diffusion-based text-to-image models have shown immense potential for various image-related tasks. However, despite their prominence and popularity, customizing these models using unauthorized data also brings serious privacy and intellectual property issues. Existing methods introduce protective perturbations based on adversarial attacks, which are applied to the customization samples. In this systematization of knowledge, we present a comprehensive survey of protective perturbation methods designed to prevent unauthorized data usage in diffusion-based image generation. We establish the threat model and categorize the downstream tasks relevant to these methods, providing a detailed analysis of their designs. We also propose a completed evaluation framework for these perturbation techniques, aiming to advance research in this field.

Via

Access Paper or Ask Questions

Embedding Watermarks in Diffusion Process for Model Intellectual Property Protection

Oct 29, 2024

Jijia Yang, Sen Peng, Xiaohua Jia

Figure 1 for Embedding Watermarks in Diffusion Process for Model Intellectual Property Protection

Figure 2 for Embedding Watermarks in Diffusion Process for Model Intellectual Property Protection

Figure 3 for Embedding Watermarks in Diffusion Process for Model Intellectual Property Protection

Figure 4 for Embedding Watermarks in Diffusion Process for Model Intellectual Property Protection

Abstract:In practical application, the widespread deployment of diffusion models often necessitates substantial investment in training. As diffusion models find increasingly diverse applications, concerns about potential misuse highlight the imperative for robust intellectual property protection. Current protection strategies either employ backdoor-based methods, integrating a watermark task as a simpler training objective with the main model task, or embedding watermarks directly into the final output samples. However, the former approach is fragile compared to existing backdoor defense techniques, while the latter fundamentally alters the expected output. In this work, we introduce a novel watermarking framework by embedding the watermark into the whole diffusion process, and theoretically ensure that our final output samples contain no additional information. Furthermore, we utilize statistical algorithms to verify the watermark from internally generated model samples without necessitating triggers as conditions. Detailed theoretical analysis and experimental validation demonstrate the effectiveness of our proposed method.

Via

Access Paper or Ask Questions

GReDP: A More Robust Approach for Differential Privacy Training with Gradient-Preserving Noise Reduction

Sep 18, 2024

Haodi Wang, Tangyu Jiang, Yu Guo, Xiaohua Jia, Chengjun Cai

Abstract:Deep learning models have been extensively adopted in various regions due to their ability to represent hierarchical features, which highly rely on the training set and procedures. Thus, protecting the training process and deep learning algorithms is paramount in privacy preservation. Although Differential Privacy (DP) as a powerful cryptographic primitive has achieved satisfying results in deep learning training, the existing schemes still fall short in preserving model utility, i.e., they either invoke a high noise scale or inevitably harm the original gradients. To address the above issues, in this paper, we present a more robust approach for DP training called GReDP. Specifically, we compute the model gradients in the frequency domain and adopt a new approach to reduce the noise level. Unlike the previous work, our GReDP only requires half of the noise scale compared to DPSGD [1] while keeping all the gradient information intact. We present a detailed analysis of our method both theoretically and empirically. The experimental results show that our GReDP works consistently better than the baselines on all models and training settings.

Via

Access Paper or Ask Questions

LMEraser: Large Model Unlearning through Adaptive Prompt Tuning

Apr 17, 2024

Jie Xu, Zihan Wu, Cong Wang, Xiaohua Jia

Abstract:To address the growing demand for privacy protection in machine learning, we propose a novel and efficient machine unlearning approach for \textbf{L}arge \textbf{M}odels, called \textbf{LM}Eraser. Existing unlearning research suffers from entangled training data and complex model architectures, incurring extremely high computational costs for large models. LMEraser takes a divide-and-conquer strategy with a prompt tuning architecture to isolate data influence. The training dataset is partitioned into public and private datasets. Public data are used to train the backbone of the model. Private data are adaptively clustered based on their diversity, and each cluster is used to optimize a prompt separately. This adaptive prompt tuning mechanism reduces unlearning costs and maintains model performance. Experiments demonstrate that LMEraser achieves a $100$-fold reduction in unlearning costs without compromising accuracy compared to prior work. Our code is available at: \url{https://github.com/lmeraser/lmeraser}.

Via

Access Paper or Ask Questions

Machine Unlearning: Solutions and Challenges

Aug 14, 2023

Jie Xu, Zihan Wu, Cong Wang, Xiaohua Jia

Figure 1 for Machine Unlearning: Solutions and Challenges

Figure 2 for Machine Unlearning: Solutions and Challenges

Figure 3 for Machine Unlearning: Solutions and Challenges

Figure 4 for Machine Unlearning: Solutions and Challenges

Abstract:Machine learning models may inadvertently memorize sensitive, unauthorized, or malicious data, posing risks of privacy violations, security breaches, and performance deterioration. To address these issues, machine unlearning has emerged as a critical technique to selectively remove specific training data points' influence on trained models. This paper provides a comprehensive taxonomy and analysis of machine unlearning research. We categorize existing research into exact unlearning that algorithmically removes data influence entirely and approximate unlearning that efficiently minimizes influence through limited parameter updates. By reviewing the state-of-the-art solutions, we critically discuss their advantages and limitations. Furthermore, we propose future directions to advance machine unlearning and establish it as an essential capability for trustworthy and adaptive machine learning. This paper provides researchers with a roadmap of open problems, encouraging impactful contributions to address real-world needs for selective data removal.

Via

Access Paper or Ask Questions

Protecting the Intellectual Property of Diffusion Models by the Watermark Diffusion Process

Jun 06, 2023

Sen Peng, Yufei Chen, Cong Wang, Xiaohua Jia

Figure 1 for Protecting the Intellectual Property of Diffusion Models by the Watermark Diffusion Process

Figure 2 for Protecting the Intellectual Property of Diffusion Models by the Watermark Diffusion Process

Figure 3 for Protecting the Intellectual Property of Diffusion Models by the Watermark Diffusion Process

Figure 4 for Protecting the Intellectual Property of Diffusion Models by the Watermark Diffusion Process

Abstract:Diffusion models have emerged as state-of-the-art deep generative architectures with the increasing demands for generation tasks. Training large diffusion models for good performance requires high resource costs, making them valuable intellectual properties to protect. While most of the existing ownership solutions, including watermarking, mainly focus on discriminative models. This paper proposes WDM, a novel watermarking method for diffusion models, including watermark embedding, extraction, and verification. WDM embeds the watermark data through training or fine-tuning the diffusion model to learn a Watermark Diffusion Process (WDP), different from the standard diffusion process for the task data. The embedded watermark can be extracted by sampling using the shared reverse noise from the learned WDP without degrading performance on the original task. We also provide theoretical foundations and analysis of the proposed method by connecting the WDP to the diffusion process with a modified Gaussian kernel. Extensive experiments are conducted to demonstrate its effectiveness and robustness against various attacks.

Via

Access Paper or Ask Questions

Privacy-Preserving Graph Neural Network Training and Inference as a Cloud Service

Feb 16, 2022

Songlei Wang, Yifeng Zheng, Xiaohua Jia

Figure 1 for Privacy-Preserving Graph Neural Network Training and Inference as a Cloud Service

Figure 2 for Privacy-Preserving Graph Neural Network Training and Inference as a Cloud Service

Figure 3 for Privacy-Preserving Graph Neural Network Training and Inference as a Cloud Service

Figure 4 for Privacy-Preserving Graph Neural Network Training and Inference as a Cloud Service

Abstract:Graphs are widely used to model the complex relationships among entities. As a powerful tool for graph analytics, graph neural networks (GNNs) have recently gained wide attention due to its end-to-end processing capabilities. With the proliferation of cloud computing, it is increasingly popular to deploy the services of complex and resource-intensive model training and inference in the cloud due to its prominent benefits. However, GNN training and inference services, if deployed in the cloud, will raise critical privacy concerns about the information-rich and proprietary graph data (and the resulting model). While there has been some work on secure neural network training and inference, they all focus on convolutional neural networks handling images and text rather than complex graph data with rich structural information. In this paper, we design, implement, and evaluate SecGNN, the first system supporting privacy-preserving GNN training and inference services in the cloud. SecGNN is built from a synergy of insights on lightweight cryptography and machine learning techniques. We deeply examine the procedure of GNN training and inference, and devise a series of corresponding secure customized protocols to support the holistic computation. Extensive experiments demonstrate that SecGNN achieves comparable plaintext training and inference accuracy, with practically affordable performance.

Via

Access Paper or Ask Questions

Detecting and Identifying Optical Signal Attacks on Autonomous Driving Systems

Oct 20, 2021

Jindi Zhang, Yifan Zhang, Kejie Lu, Jianping Wang, Kui Wu, Xiaohua Jia, Bin Liu

Figure 1 for Detecting and Identifying Optical Signal Attacks on Autonomous Driving Systems

Figure 2 for Detecting and Identifying Optical Signal Attacks on Autonomous Driving Systems

Figure 3 for Detecting and Identifying Optical Signal Attacks on Autonomous Driving Systems

Figure 4 for Detecting and Identifying Optical Signal Attacks on Autonomous Driving Systems

Abstract:For autonomous driving, an essential task is to detect surrounding objects accurately. To this end, most existing systems use optical devices, including cameras and light detection and ranging (LiDAR) sensors, to collect environment data in real time. In recent years, many researchers have developed advanced machine learning models to detect surrounding objects. Nevertheless, the aforementioned optical devices are vulnerable to optical signal attacks, which could compromise the accuracy of object detection. To address this critical issue, we propose a framework to detect and identify sensors that are under attack. Specifically, we first develop a new technique to detect attacks on a system that consists of three sensors. Our main idea is to: 1) use data from three sensors to obtain two versions of depth maps (i.e., disparity) and 2) detect attacks by analyzing the distribution of disparity errors. In our study, we use real data sets and the state-of-the-art machine learning model to evaluate our attack detection scheme and the results confirm the effectiveness of our detection method. Based on the detection scheme, we further develop an identification model that is capable of identifying up to n-2 attacked sensors in a system with one LiDAR and n cameras. We prove the correctness of our identification scheme and conduct experiments to show the accuracy of our identification method. Finally, we investigate the overall sensitivity of our framework.

Via

Access Paper or Ask Questions