Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Huseyin A Inan

Differentially Private Synthetic Data via Foundation Model APIs 2: Text

Mar 04, 2024

Chulin Xie, Zinan Lin, Arturs Backurs, Sivakanth Gopi, Da Yu, Huseyin A Inan, Harsha Nori, Haotian Jiang, Huishuai Zhang, Yin Tat Lee(+2 more)

Figure 1 for Differentially Private Synthetic Data via Foundation Model APIs 2: Text

Figure 2 for Differentially Private Synthetic Data via Foundation Model APIs 2: Text

Figure 3 for Differentially Private Synthetic Data via Foundation Model APIs 2: Text

Figure 4 for Differentially Private Synthetic Data via Foundation Model APIs 2: Text

Abstract:Text data has become extremely valuable due to the emergence of machine learning algorithms that learn from it. A lot of high-quality text data generated in the real world is private and therefore cannot be shared or used freely due to privacy concerns. Generating synthetic replicas of private text data with a formal privacy guarantee, i.e., differential privacy (DP), offers a promising and scalable solution. However, existing methods necessitate DP finetuning of large language models (LLMs) on private data to generate DP synthetic data. This approach is not viable for proprietary LLMs (e.g., GPT-3.5) and also demands considerable computational resources for open-source LLMs. Lin et al. (2024) recently introduced the Private Evolution (PE) algorithm to generate DP synthetic images with only API access to diffusion models. In this work, we propose an augmented PE algorithm, named Aug-PE, that applies to the complex setting of text. We use API access to an LLM and generate DP synthetic text without any model training. We conduct comprehensive experiments on three benchmark datasets. Our results demonstrate that Aug-PE produces DP synthetic text that yields competitive utility with the SOTA DP finetuning baselines. This underscores the feasibility of relying solely on API access of LLMs to produce high-quality DP synthetic texts, thereby facilitating more accessible routes to privacy-preserving LLM applications. Our code and data are available at https://github.com/AI-secure/aug-pe.

Via

Access Paper or Ask Questions

Assessing Privacy Risks in Language Models: A Case Study on Summarization Tasks

Oct 20, 2023

Ruixiang Tang, Gord Lueck, Rodolfo Quispe, Huseyin A Inan, Janardhan Kulkarni, Xia Hu

Figure 1 for Assessing Privacy Risks in Language Models: A Case Study on Summarization Tasks

Figure 2 for Assessing Privacy Risks in Language Models: A Case Study on Summarization Tasks

Figure 3 for Assessing Privacy Risks in Language Models: A Case Study on Summarization Tasks

Figure 4 for Assessing Privacy Risks in Language Models: A Case Study on Summarization Tasks

Abstract:Large language models have revolutionized the field of NLP by achieving state-of-the-art performance on various tasks. However, there is a concern that these models may disclose information in the training data. In this study, we focus on the summarization task and investigate the membership inference (MI) attack: given a sample and black-box access to a model's API, it is possible to determine if the sample was part of the training data. We exploit text similarity and the model's resistance to document modifications as potential MI signals and evaluate their effectiveness on widely used datasets. Our results demonstrate that summarization models are at risk of exposing data membership, even in cases where the reference summary is not available. Furthermore, we discuss several safeguards for training summarization models to protect against MI attacks and discuss the inherent trade-off between privacy and utility.

Via

Access Paper or Ask Questions

Differentially Private Model Compression

Jun 03, 2022

Fatemehsadat Mireshghallah, Arturs Backurs, Huseyin A Inan, Lukas Wutschitz, Janardhan Kulkarni

Figure 1 for Differentially Private Model Compression

Figure 2 for Differentially Private Model Compression

Figure 3 for Differentially Private Model Compression

Figure 4 for Differentially Private Model Compression

Abstract:Recent papers have shown that large pre-trained language models (LLMs) such as BERT, GPT-2 can be fine-tuned on private data to achieve performance comparable to non-private models for many downstream Natural Language Processing (NLP) tasks while simultaneously guaranteeing differential privacy. The inference cost of these models -- which consist of hundreds of millions of parameters -- however, can be prohibitively large. Hence, often in practice, LLMs are compressed before they are deployed in specific applications. In this paper, we initiate the study of differentially private model compression and propose frameworks for achieving 50% sparsity levels while maintaining nearly full performance. We demonstrate these ideas on standard GLUE benchmarks using BERT models, setting benchmarks for future research on this topic.

Via

Access Paper or Ask Questions