Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shagufta Mehnaz

Disparate Privacy Vulnerability: Targeted Attribute Inference Attacks and Defenses

Apr 05, 2025

Ehsanul Kabir, Lucas Craig, Shagufta Mehnaz

Abstract:As machine learning (ML) technologies become more prevalent in privacy-sensitive areas like healthcare and finance, eventually incorporating sensitive information in building data-driven algorithms, it is vital to scrutinize whether these data face any privacy leakage risks. One potential threat arises from an adversary querying trained models using the public, non-sensitive attributes of entities in the training data to infer their private, sensitive attributes, a technique known as the attribute inference attack. This attack is particularly deceptive because, while it may perform poorly in predicting sensitive attributes across the entire dataset, it excels at predicting the sensitive attributes of records from a few vulnerable groups, a phenomenon known as disparate vulnerability. This paper illustrates that an adversary can take advantage of this disparity to carry out a series of new attacks, showcasing a threat level beyond previous imagination. We first develop a novel inference attack called the disparity inference attack, which targets the identification of high-risk groups within the dataset. We then introduce two targeted variations of the attribute inference attack that can identify and exploit a vulnerable subset of the training data, marking the first instances of targeted attacks in this category, achieving significantly higher accuracy than untargeted versions. We are also the first to introduce a novel and effective disparity mitigation technique that simultaneously preserves model performance and prevents any risk of targeted attacks.

* Selected for publication at 34th USENIX Security Symposium

Via

Access Paper or Ask Questions

Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage

Aug 30, 2024

Md Rafi Ur Rashid, Jing Liu, Toshiaki Koike-Akino, Shagufta Mehnaz, Ye Wang

Abstract:Fine-tuning large language models on private data for downstream applications poses significant privacy risks in potentially exposing sensitive information. Several popular community platforms now offer convenient distribution of a large variety of pre-trained models, allowing anyone to publish without rigorous verification. This scenario creates a privacy threat, as pre-trained models can be intentionally crafted to compromise the privacy of fine-tuning datasets. In this study, we introduce a novel poisoning technique that uses model-unlearning as an attack tool. This approach manipulates a pre-trained language model to increase the leakage of private data during the fine-tuning process. Our method enhances both membership inference and data extraction attacks while preserving model utility. Experimental results across different models, datasets, and fine-tuning setups demonstrate that our attacks significantly surpass baseline performance. This work serves as a cautionary note for users who download pre-trained models from unverified sources, highlighting the potential risks involved.

Via

Access Paper or Ask Questions

Second-Order Information Matters: Revisiting Machine Unlearning for Large Language Models

Mar 13, 2024

Kang Gu, Md Rafi Ur Rashid, Najrin Sultana, Shagufta Mehnaz

Abstract:With the rapid development of Large Language Models (LLMs), we have witnessed intense competition among the major LLM products like ChatGPT, LLaMa, and Gemini. However, various issues (e.g. privacy leakage and copyright violation) of the training corpus still remain underexplored. For example, the Times sued OpenAI and Microsoft for infringing on its copyrights by using millions of its articles for training. From the perspective of LLM practitioners, handling such unintended privacy violations can be challenging. Previous work addressed the ``unlearning" problem of LLMs using gradient information, while they mostly introduced significant overheads like data preprocessing or lacked robustness. In this paper, contrasting with the methods based on first-order information, we revisit the unlearning problem via the perspective of second-order information (Hessian). Our unlearning algorithms, which are inspired by classic Newton update, are not only data-agnostic/model-agnostic but also proven to be robust in terms of utility preservation or privacy guarantee. Through a comprehensive evaluation with four NLP datasets as well as a case study on real-world datasets, our methods consistently show superiority over the first-order methods.

Via

Access Paper or Ask Questions

FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering

Oct 24, 2023

Md Rafi Ur Rashid, Vishnu Asutosh Dasu, Kang Gu, Najrin Sultana, Shagufta Mehnaz

Figure 1 for FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering

Figure 2 for FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering

Figure 3 for FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering

Figure 4 for FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering

Abstract:Federated learning (FL) is becoming a key component in many technology-based applications including language modeling -- where individual FL participants often have privacy-sensitive text data in their local datasets. However, realizing the extent of privacy leakage in federated language models is not straightforward and the existing attacks only intend to extract data regardless of how sensitive or naive it is. To fill this gap, in this paper, we introduce two novel findings with regard to leaking privacy-sensitive user data from federated language models. Firstly, we make a key observation that model snapshots from the intermediate rounds in FL can cause greater privacy leakage than the final trained model. Secondly, we identify that privacy leakage can be aggravated by tampering with a model's selective weights that are specifically responsible for memorizing the sensitive training data. We show how a malicious client can leak the privacy-sensitive data of some other user in FL even without any cooperation from the server. Our best-performing method improves the membership inference recall by 29% and achieves up to 70% private data reconstruction, evidently outperforming existing attacks with stronger assumptions of adversary capabilities.

* 22 pages (including bibliography and Appendix), Submitted to USENIX Security '24

Via

Access Paper or Ask Questions

FLShield: A Validation Based Federated Learning Framework to Defend Against Poisoning Attacks

Aug 10, 2023

Ehsanul Kabir, Zeyu Song, Md Rafi Ur Rashid, Shagufta Mehnaz

Abstract:Federated learning (FL) is revolutionizing how we learn from data. With its growing popularity, it is now being used in many safety-critical domains such as autonomous vehicles and healthcare. Since thousands of participants can contribute in this collaborative setting, it is, however, challenging to ensure security and reliability of such systems. This highlights the need to design FL systems that are secure and robust against malicious participants' actions while also ensuring high utility, privacy of local data, and efficiency. In this paper, we propose a novel FL framework dubbed as FLShield that utilizes benign data from FL participants to validate the local models before taking them into account for generating the global model. This is in stark contrast with existing defenses relying on server's access to clean datasets -- an assumption often impractical in real-life scenarios and conflicting with the fundamentals of FL. We conduct extensive experiments to evaluate our FLShield framework in different settings and demonstrate its effectiveness in thwarting various types of poisoning and backdoor attacks including a defense-aware one. FLShield also preserves privacy of local data against gradient inversion attacks.

Via

Access Paper or Ask Questions

Are Your Sensitive Attributes Private? Novel Model Inversion Attribute Inference Attacks on Classification Models

Jan 23, 2022

Shagufta Mehnaz, Sayanton V. Dibbo, Ehsanul Kabir, Ninghui Li, Elisa Bertino

Figure 1 for Are Your Sensitive Attributes Private? Novel Model Inversion Attribute Inference Attacks on Classification Models

Figure 2 for Are Your Sensitive Attributes Private? Novel Model Inversion Attribute Inference Attacks on Classification Models

Figure 3 for Are Your Sensitive Attributes Private? Novel Model Inversion Attribute Inference Attacks on Classification Models

Figure 4 for Are Your Sensitive Attributes Private? Novel Model Inversion Attribute Inference Attacks on Classification Models

Abstract:Increasing use of machine learning (ML) technologies in privacy-sensitive domains such as medical diagnoses, lifestyle predictions, and business decisions highlights the need to better understand if these ML technologies are introducing leakage of sensitive and proprietary training data. In this paper, we focus on model inversion attacks where the adversary knows non-sensitive attributes about records in the training data and aims to infer the value of a sensitive attribute unknown to the adversary, using only black-box access to the target classification model. We first devise a novel confidence score-based model inversion attribute inference attack that significantly outperforms the state-of-the-art. We then introduce a label-only model inversion attack that relies only on the model's predicted labels but still matches our confidence score-based attack in terms of attack effectiveness. We also extend our attacks to the scenario where some of the other (non-sensitive) attributes of a target record are unknown to the adversary. We evaluate our attacks on two types of machine learning models, decision tree and deep neural network, trained on three real datasets. Moreover, we empirically demonstrate the disparate vulnerability of model inversion attacks, i.e., specific groups in the training dataset (grouped by gender, race, etc.) could be more vulnerable to model inversion attacks.

* Conditionally accepted to USENIX Security 2022. This is not the camera-ready version. arXiv admin note: substantial text overlap with arXiv:2012.03404

Via

Access Paper or Ask Questions

Black-box Model Inversion Attribute Inference Attacks on Classification Models

Dec 07, 2020

Shagufta Mehnaz, Ninghui Li, Elisa Bertino

Figure 1 for Black-box Model Inversion Attribute Inference Attacks on Classification Models

Figure 2 for Black-box Model Inversion Attribute Inference Attacks on Classification Models

Figure 3 for Black-box Model Inversion Attribute Inference Attacks on Classification Models

Figure 4 for Black-box Model Inversion Attribute Inference Attacks on Classification Models

Abstract:Increasing use of ML technologies in privacy-sensitive domains such as medical diagnoses, lifestyle predictions, and business decisions highlights the need to better understand if these ML technologies are introducing leakages of sensitive and proprietary training data. In this paper, we focus on one kind of model inversion attacks, where the adversary knows non-sensitive attributes about instances in the training data and aims to infer the value of a sensitive attribute unknown to the adversary, using oracle access to the target classification model. We devise two novel model inversion attribute inference attacks -- confidence modeling-based attack and confidence score-based attack, and also extend our attack to the case where some of the other (non-sensitive) attributes are unknown to the adversary. Furthermore, while previous work uses accuracy as the metric to evaluate the effectiveness of attribute inference attacks, we find that accuracy is not informative when the sensitive attribute distribution is unbalanced. We identify two metrics that are better for evaluating attribute inference attacks, namely G-mean and Matthews correlation coefficient (MCC). We evaluate our attacks on two types of machine learning models, decision tree and deep neural network, trained with two real datasets. Experimental results show that our newly proposed attacks significantly outperform the state-of-the-art attacks. Moreover, we empirically show that specific groups in the training dataset (grouped by attributes, e.g., gender, race) could be more vulnerable to model inversion attacks. We also demonstrate that our attacks' performances are not impacted significantly when some of the other (non-sensitive) attributes are also unknown to the adversary.

Via

Access Paper or Ask Questions