Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kevin Baum

Saarland University, Germany

Soft Begging: Modular and Efficient Shielding of LLMs against Prompt Injection and Jailbreaking based on Prompt Tuning

Jul 03, 2024

Simon Ostermann, Kevin Baum, Christoph Endres, Julia Masloh, Patrick Schramowski

Abstract:Prompt injection (both direct and indirect) and jailbreaking are now recognized as significant issues for large language models (LLMs), particularly due to their potential for harm in application-integrated contexts. This extended abstract explores a novel approach to protecting LLMs from such attacks, termed "soft begging." This method involves training soft prompts to counteract the effects of corrupted prompts on the LLM's output. We provide an overview of prompt injections and jailbreaking, introduce the theoretical basis of the "soft begging" technique, and discuss an evaluation of its effectiveness.

Via

Access Paper or Ask Questions

Software Doping Analysis for Human Oversight

Aug 11, 2023

Sebastian Biewer, Kevin Baum, Sarah Sterz, Holger Hermanns, Sven Hetmank, Markus Langer, Anne Lauber-Rönsberg, Franz Lehr

Abstract:This article introduces a framework that is meant to assist in mitigating societal risks that software can pose. Concretely, this encompasses facets of software doping as well as unfairness and discrimination in high-risk decision-making systems. The term software doping refers to software that contains surreptitiously added functionality that is against the interest of the user. A prominent example of software doping are the tampered emission cleaning systems that were found in millions of cars around the world when the diesel emissions scandal surfaced. The first part of this article combines the formal foundations of software doping analysis with established probabilistic falsification techniques to arrive at a black-box analysis technique for identifying undesired effects of software. We apply this technique to emission cleaning systems in diesel cars but also to high-risk systems that evaluate humans in a possibly unfair or discriminating way. We demonstrate how our approach can assist humans-in-the-loop to make better informed and more responsible decisions. This is to promote effective human oversight, which will be a central requirement enforced by the European Union's upcoming AI Act. We complement our technical contribution with a juridically, philosophically, and psychologically informed perspective on the potential problems caused by such systems.

* Submitted to Formal Methods in System Design, special issue for FASE 2022

Via

Access Paper or Ask Questions

What Do We Want From Explainable Artificial Intelligence ? -- A Stakeholder Perspective on XAI and a Conceptual Model Guiding Interdisciplinary XAI Research

Feb 15, 2021

Markus Langer, Daniel Oster, Timo Speith, Holger Hermanns, Lena Kästner, Eva Schmidt, Andreas Sesing, Kevin Baum

Figure 1 for What Do We Want From Explainable Artificial Intelligence ? -- A Stakeholder Perspective on XAI and a Conceptual Model Guiding Interdisciplinary XAI Research

Figure 2 for What Do We Want From Explainable Artificial Intelligence ? -- A Stakeholder Perspective on XAI and a Conceptual Model Guiding Interdisciplinary XAI Research

Abstract:Previous research in Explainable Artificial Intelligence (XAI) suggests that a main aim of explainability approaches is to satisfy specific interests, goals, expectations, needs, and demands regarding artificial systems (we call these stakeholders' desiderata) in a variety of contexts. However, the literature on XAI is vast, spreads out across multiple largely disconnected disciplines, and it often remains unclear how explainability approaches are supposed to achieve the goal of satisfying stakeholders' desiderata. This paper discusses the main classes of stakeholders calling for explainability of artificial systems and reviews their desiderata. We provide a model that explicitly spells out the main concepts and relations necessary to consider and investigate when evaluating, adjusting, choosing, and developing explainability approaches that aim to satisfy stakeholders' desiderata. This model can serve researchers from the variety of different disciplines involved in XAI as a common ground. It emphasizes where there is interdisciplinary potential in the evaluation and the development of explainability approaches.

* 57 pages, 2 figures, 1 table, to be published in Artificial Intelligence, Markus Langer, Daniel Oster and Timo Speith share first-authorship of this paper

Via

Access Paper or Ask Questions

Towards a Framework Combining Machine Ethics and Machine Explainability

Jan 03, 2019

Kevin Baum, Holger Hermanns, Timo Speith

Figure 1 for Towards a Framework Combining Machine Ethics and Machine Explainability

Figure 2 for Towards a Framework Combining Machine Ethics and Machine Explainability

Abstract:We find ourselves surrounded by a rapidly increasing number of autonomous and semi-autonomous systems. Two grand challenges arise from this development: Machine Ethics and Machine Explainability. Machine Ethics, on the one hand, is concerned with behavioral constraints for systems, so that morally acceptable, restricted behavior results; Machine Explainability, on the other hand, enables systems to explain their actions and argue for their decisions, so that human users can understand and justifiably trust them. In this paper, we try to motivate and work towards a framework combining Machine Ethics and Machine Explainability. Starting from a toy example, we detect various desiderata of such a framework and argue why they should and how they could be incorporated in autonomous systems. Our main idea is to apply a framework of formal argumentation theory both, for decision-making under ethical constraints and for the task of generating useful explanations given only limited knowledge of the world. The result of our deliberations can be described as a first version of an ethically motivated, principle-governed framework combining Machine Ethics and Machine Explainability

* EPTCS 286, 2019, pp. 34-49
* In Proceedings CREST 2018, arXiv:1901.00073

Via

Access Paper or Ask Questions