Abstract:Federated Learning (FL) exhibits privacy vulnerabilities under gradient inversion attacks (GIAs), which can extract private information from individual gradients. To enhance privacy, FL incorporates Secure Aggregation (SA) to prevent the server from obtaining individual gradients, thus effectively resisting GIAs. In this paper, we propose a stealthy label inference attack to bypass SA and recover individual clients' private labels. Specifically, we conduct a theoretical analysis of label inference from the aggregated gradients that are exclusively obtained after implementing SA. The analysis results reveal that the inputs (embeddings) and outputs (logits) of the final fully connected layer (FCL) contribute to gradient disaggregation and label restoration. To preset the embeddings and logits of FCL, we craft a fishing model by solely modifying the parameters of a single batch normalization (BN) layer in the original model. Distributing client-specific fishing models, the server can derive the individual gradients regarding the bias of FCL by resolving a linear system with expected embeddings and the aggregated gradients as coefficients. Then the labels of each client can be precisely computed based on preset logits and gradients of FCL's bias. Extensive experiments show that our attack achieves large-scale label recovery with 100\% accuracy on various datasets and model architectures.
Abstract:Language models (LMs) are susceptible to "memorizing" training data, including a large amount of private or copyright-protected content. To safeguard the right to be forgotten (RTBF), machine unlearning has emerged as a promising method for LMs to efficiently "forget" sensitive training content and mitigate knowledge leakage risks. However, despite its good intentions, could the unlearning mechanism be counterproductive? In this paper, we propose the Textual Unlearning Leakage Attack (TULA), where an adversary can infer information about the unlearned data only by accessing the models before and after unlearning. Furthermore, we present variants of TULA in both black-box and white-box scenarios. Through various experimental results, we critically demonstrate that machine unlearning amplifies the risk of knowledge leakage from LMs. Specifically, TULA can increase an adversary's ability to infer membership information about the unlearned data by more than 20% in black-box scenario. Moreover, TULA can even reconstruct the unlearned data directly with more than 60% accuracy with white-box access. Our work is the first to reveal that machine unlearning in LMs can inversely create greater knowledge risks and inspire the development of more secure unlearning mechanisms.
Abstract:Federated learning (FL) enables collaborative model training among multiple clients without raw data exposure. However, recent studies have shown that clients' private training data can be reconstructed from the gradients they share in FL, known as gradient inversion attacks (GIAs). While GIAs have demonstrated effectiveness under \emph{ideal settings and auxiliary assumptions}, their actual efficacy against \emph{practical FL systems} remains under-explored. To address this gap, we conduct a comprehensive study on GIAs in this work. We start with a survey of GIAs that establishes a milestone to trace their evolution and develops a systematization to uncover their inherent threats. Specifically, we categorize the auxiliary assumptions used by existing GIAs based on their practical accessibility to potential adversaries. To facilitate deeper analysis, we highlight the challenges that GIAs face in practical FL systems from three perspectives: \textit{local training}, \textit{model}, and \textit{post-processing}. We then perform extensive theoretical and empirical evaluations of state-of-the-art GIAs across diverse settings, utilizing eight datasets and thirteen models. Our findings indicate that GIAs have inherent limitations when reconstructing data under practical local training settings. Furthermore, their efficacy is sensitive to the trained model, and even simple post-processing measures applied to gradients can be effective defenses. Overall, our work provides crucial insights into the limited effectiveness of GIAs in practical FL systems. By rectifying prior misconceptions, we hope to inspire more accurate and realistic investigations on this topic.