Abstract:As Large Language Models (LLMs) are deployed and integrated into thousands of applications, the need for scalable evaluation of how models respond to adversarial attacks grows rapidly. However, LLM security is a moving target: models produce unpredictable output, are constantly updated, and the potential adversary is highly diverse: anyone with access to the internet and a decent command of natural language. Further, what constitutes a security weak in one context may not be an issue in a different context; one-fits-all guardrails remain theoretical. In this paper, we argue that it is time to rethink what constitutes ``LLM security'', and pursue a holistic approach to LLM security evaluation, where exploration and discovery of issues are central. To this end, this paper introduces garak (Generative AI Red-teaming and Assessment Kit), a framework which can be used to discover and identify vulnerabilities in a target LLM or dialog system. garak probes an LLM in a structured fashion to discover potential vulnerabilities. The outputs of the framework describe a target model's weaknesses, contribute to an informed discussion of what composes vulnerabilities in unique contexts, and can inform alignment and policy discussions for LLM deployment.
Abstract:The recent leaps in complexity and fluency of Large Language Models (LLMs) mean that, for the first time in human history, people can interact with computers using natural language alone. This creates monumental possibilities of automation and accessibility of computing, but also raises severe security and safety threats: When everyone can interact with LLMs, everyone can potentially break into the systems running LLMs. All it takes is creative use of language. This paper presents Hacc-Man, a game which challenges its players to "jailbreak" an LLM: subvert the LLM to output something that it is not intended to. Jailbreaking is at the intersection between creative problem solving and LLM security. The purpose of the game is threefold: 1. To heighten awareness of the risks of deploying fragile LLMs in everyday systems, 2. To heighten people's self-efficacy in interacting with LLMs, and 3. To discover the creative problem solving strategies, people deploy in this novel context.
Abstract:Engaging in the deliberate generation of abnormal outputs from large language models (LLMs) by attacking them is a novel human activity. This paper presents a thorough exposition of how and why people perform such attacks. Using a formal qualitative methodology, we interviewed dozens of practitioners from a broad range of backgrounds, all contributors to this novel work of attempting to cause LLMs to fail. We relate and connect this activity between its practitioners' motivations and goals; the strategies and techniques they deploy; and the crucial role the community plays. As a result, this paper presents a grounded theory of how and why people attack large language models: LLM red teaming in the wild.
Abstract:Generative AI, i.e., the group of technologies that automatically generate visual or written content based on text prompts, has undergone a leap in complexity and become widely available within just a few years. Such technologies potentially introduce a massive disruption to creative fields. This paper presents the results of a qualitative survey ($N$ = 23) investigating how creative professionals think about generative AI. The results show that the advancement of these AI models prompts important reflections on what defines creativity and how creatives imagine using AI to support their workflows. Based on these reflections, we discuss how we might design \textit{participatory AI} in the domain of creative expertise with the goal of empowering creative professionals in their present and future coexistence with AI.
Abstract:Misinformation spread presents a technological and social threat to society. With the advance of AI-based language models, automatically generated texts have become difficult to identify and easy to create at scale. We present "The Rumour Mill", a playful art piece, designed as a commentary on the spread of rumours and automatically-generated misinformation. The mill is a tabletop interactive machine, which invites a user to experience the process of creating believable text by interacting with different tangible controls on the mill. The user manipulates visible parameters to adjust the genre and type of an automatically generated text rumour. The Rumour Mill is a physical demonstration of the state of current technology and its ability to generate and manipulate natural language text, and of the act of starting and spreading rumours.