Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation

Feb 23, 2024

Shenglai Zeng, Jiankun Zhang, Pengfei He, Yue Xing, Yiding Liu, Han Xu, Jie Ren, Shuaiqiang Wang, Dawei Yin, Yi Chang(+1 more)

Figure 1 for The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation

Figure 2 for The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation

Figure 3 for The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation

Figure 4 for The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation

Share this with someone who'll enjoy it:

Abstract:Retrieval-augmented generation (RAG) is a powerful technique to facilitate language model with proprietary and private data, where data privacy is a pivotal concern. Whereas extensive research has demonstrated the privacy risks of large language models (LLMs), the RAG technique could potentially reshape the inherent behaviors of LLM generation, posing new privacy issues that are currently under-explored. In this work, we conduct extensive empirical studies with novel attack methods, which demonstrate the vulnerability of RAG systems on leaking the private retrieval database. Despite the new risk brought by RAG on the retrieval data, we further reveal that RAG can mitigate the leakage of the LLMs' training data. Overall, we provide new insights in this paper for privacy protection of retrieval-augmented LLMs, which benefit both LLMs and RAG systems builders. Our code is available at https://github.com/phycholosogy/RAG-privacy.

View paper on

Share this with someone who'll enjoy it:

Title:The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation

Paper and Code