Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:KnowledgeSG: Privacy-Preserving Synthetic Text Generation with Knowledge Distillation from Server

Oct 10, 2024

Wenhao Wang, Xiaoyu Liang, Rui Ye, Jingyi Chai, Siheng Chen, Yanfeng Wang

Figure 1 for KnowledgeSG: Privacy-Preserving Synthetic Text Generation with Knowledge Distillation from Server

Figure 2 for KnowledgeSG: Privacy-Preserving Synthetic Text Generation with Knowledge Distillation from Server

Figure 3 for KnowledgeSG: Privacy-Preserving Synthetic Text Generation with Knowledge Distillation from Server

Figure 4 for KnowledgeSG: Privacy-Preserving Synthetic Text Generation with Knowledge Distillation from Server

Share this with someone who'll enjoy it:

Abstract:The success of large language models (LLMs) facilitate many parties to fine-tune LLMs on their own private data. However, this practice raises privacy concerns due to the memorization of LLMs. Existing solutions, such as utilizing synthetic data for substitution, struggle to simultaneously improve performance and preserve privacy. They either rely on a local model for generation, resulting in a performance decline, or take advantage of APIs, directly exposing the data to API servers. To address this issue, we propose KnowledgeSG, a novel client-server framework which enhances synthetic data quality and improves model performance while ensuring privacy. We achieve this by learning local knowledge from the private data with differential privacy (DP) and distilling professional knowledge from the server. Additionally, inspired by federated learning, we transmit models rather than data between the client and server to prevent privacy leakage. Extensive experiments in medical and financial domains demonstrate the effectiveness of KnowledgeSG. Our code is now publicly available at https://github.com/wwh0411/KnowledgeSG.

* EMNLP 2024 Main

View paper on

Share this with someone who'll enjoy it:

Title:KnowledgeSG: Privacy-Preserving Synthetic Text Generation with Knowledge Distillation from Server

Paper and Code