Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?

Mar 11, 2024

Egor Zverev, Sahar Abdelnabi, Mario Fritz, Christoph H. Lampert

Figure 1 for Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?

Figure 2 for Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?

Figure 3 for Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?

Figure 4 for Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?

Share this with someone who'll enjoy it:

Abstract:Instruction-tuned Large Language Models (LLMs) have achieved breakthrough results, opening countless new possibilities for many practical applications. However, LLMs lack elementary safety features that are established norms in other areas of computer science, such as the separation between instructions and data, causing them to malfunction or rendering them vulnerable to manipulation and interference by third parties e.g., via indirect prompt/command injection. Even worse, so far, there is not even an established definition of what precisely such a separation would mean and how its violation could be tested. In this work, we aim to close this gap. We introduce a formal measure to quantify the phenomenon of instruction-data separation as well as an empirical variant of the measure that can be computed from a model`s black-box outputs. We also introduce a new dataset, SEP (Should it be Executed or Processed?), which allows estimating the measure, and we report results on several state-of-the-art open-source and closed LLMs. Finally, we quantitatively demonstrate that all evaluated LLMs fail to achieve a high amount of separation, according to our measure. The source code and SEP dataset are openly accessible at https://github.com/egozverev/Shold-It-Be-Executed-Or-Processed.

* Accepted for ICLR 2024 Workshop on Secure and Trustworthy Large Language Models, GitHub: https://github.com/egozverev/Shold-It-Be-Executed-Or-Processed. 5 pages main text, 17 pages in total

View paper on

Share this with someone who'll enjoy it:

Title:Can LLMs Separate Instructions From Data? And What Do We Even Mean By That?

Paper and Code