Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

William MacLean

NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human

Jun 06, 2024

Shuo Huang, William MacLean, Xiaoxi Kang, Anqi Wu, Lizhen Qu, Qiongkai Xu, Zhuang Li, Xingliang Yuan, Gholamreza Haffari

Figure 1 for NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human

Figure 2 for NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human

Figure 3 for NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human

Figure 4 for NAP^2: A Benchmark for Naturalness and Privacy-Preserving Text Rewriting by Learning from Human

Abstract:Increasing concerns about privacy leakage issues in academia and industry arise when employing NLP models from third-party providers to process sensitive texts. To protect privacy before sending sensitive data to those models, we suggest sanitizing sensitive text using two common strategies used by humans: i) deleting sensitive expressions, and ii) obscuring sensitive details by abstracting them. To explore the issues and develop a tool for text rewriting, we curate the first corpus, coined NAP^2, through both crowdsourcing and the use of large language models (LLMs). Compared to the prior works based on differential privacy, which lead to a sharp drop in information utility and unnatural texts, the human-inspired approaches result in more natural rewrites and offer an improved balance between privacy protection and data utility, as demonstrated by our extensive experiments.

Via

Access Paper or Ask Questions