Abstract:In the burgeoning field of Large Language Models (LLMs), developing a robust safety mechanism, colloquially known as "safeguards" or "guardrails", has become imperative to ensure the ethical use of LLMs within prescribed boundaries. This article provides a systematic literature review on the current status of this critical mechanism. It discusses its major challenges and how it can be enhanced into a comprehensive mechanism dealing with ethical issues in various contexts. First, the paper elucidates the current landscape of safeguarding mechanisms that major LLM service providers and the open-source community employ. This is followed by the techniques to evaluate, analyze, and enhance some (un)desirable properties that a guardrail might want to enforce, such as hallucinations, fairness, privacy, and so on. Based on them, we review techniques to circumvent these controls (i.e., attacks), to defend the attacks, and to reinforce the guardrails. While the techniques mentioned above represent the current status and the active research trends, we also discuss several challenges that cannot be easily dealt with by the methods and present our vision on how to implement a comprehensive guardrail through the full consideration of multi-disciplinary approach, neural-symbolic method, and systems development lifecycle.
Abstract:Reinforcing or even exacerbating societal biases and inequalities will increase significantly as generative AI increasingly produces useful artifacts, from text to images and beyond, for the real world. We address these issues by formally characterizing the notion of fairness for generative AI as a basis for monitoring and enforcing fairness. We define two levels of fairness using the notion of infinite sequences of abstractions of AI-generated artifacts such as text or images. The first is the fairness demonstrated on the generated sequences, which is evaluated only on the outputs while agnostic to the prompts and models used. The second is the inherent fairness of the generative AI model, which requires that fairness be manifested when input prompts are neutral, that is, they do not explicitly instruct the generative AI to produce a particular type of output. We also study relative intersectional fairness to counteract the combinatorial explosion of fairness when considering multiple categories together with lazy fairness enforcement. Finally, fairness monitoring and enforcement are tested against some current generative AI models.
Abstract:Out-of-distribution (OoD) detection techniques for deep neural networks (DNNs) become crucial thanks to their filtering of abnormal inputs, especially when DNNs are used in safety-critical applications and interact with an open and dynamic environment. Nevertheless, integrating OoD detection into state-of-the-art (SOTA) object detection DNNs poses significant challenges, partly due to the complexity introduced by the SOTA OoD construction methods, which require the modification of DNN architecture and the introduction of complex loss functions. This paper proposes a simple, yet surprisingly effective, method that requires neither retraining nor architectural change in object detection DNN, called Box Abstraction-based Monitors (BAM). The novelty of BAM stems from using a finite union of convex box abstractions to capture the learned features of objects for in-distribution (ID) data, and an important observation that features from OoD data are more likely to fall outside of these boxes. The union of convex regions within the feature space allows the formation of non-convex and interpretable decision boundaries, overcoming the limitations of VOS-like detectors without sacrificing real-time performance. Experiments integrating BAM into Faster R-CNN-based object detection DNNs demonstrate a considerably improved performance against SOTA OoD detection techniques.
Abstract:In this paper, we present a novel digital twin prototype for a learning-enabled self-driving vehicle. The primary objective of this digital twin is to perform traffic sign recognition and lane keeping. The digital twin architecture relies on co-simulation and uses the Functional Mock-up Interface and SystemC Transaction Level Modeling standards. The digital twin consists of four clients, i) a vehicle model that is designed in Amesim tool, ii) an environment model developed in Prescan, iii) a lane-keeping controller designed in Robot Operating System, and iv) a perception and speed control module developed in the formal modeling language of BIP (Behavior, Interaction, Priority). These clients interface with the digital twin platform, PAVE360-Veloce System Interconnect (PAVE360-VSI). PAVE360-VSI acts as the co-simulation orchestrator and is responsible for synchronization, interconnection, and data exchange through a server. The server establishes connections among the different clients and also ensures adherence to the Ethernet protocol. We conclude with illustrative digital twin simulations and recommendations for future work.
Abstract:Machine learning has made remarkable advancements, but confidently utilising learning-enabled components in safety-critical domains still poses challenges. Among the challenges, it is known that a rigorous, yet practical, way of achieving safety guarantees is one of the most prominent. In this paper, we first discuss the engineering and research challenges associated with the design and verification of such systems. Then, based on the observation that existing works cannot actually achieve provable guarantees, we promote a two-step verification method for the ultimate achievement of provable statistical guarantees.
Abstract:Out-of-distribution (OoD) detection techniques are instrumental for safety-related neural networks. We are arguing, however, that current performance-oriented OoD detection techniques geared towards matching metrics such as expected calibration error, are not sufficient for establishing safety claims. What is missing is a rigorous design approach for developing, verifying, and validating OoD detectors. These design principles need to be aligned with the intended functionality and the operational domain. Here, we formulate some of the key technical challenges, together with a possible way forward, for developing a rigorous and safety-related design methodology for OoD detectors.
Abstract:Large Language Models (LLMs) have exploded a new heatwave of AI, for their ability to engage end-users in human-level conversations with detailed and articulate answers across many knowledge domains. In response to their fast adoption in many industrial applications, this survey concerns their safety and trustworthiness. First, we review known vulnerabilities of the LLMs, categorising them into inherent issues, intended attacks, and unintended bugs. Then, we consider if and how the Verification and Validation (V&V) techniques, which have been widely developed for traditional software and deep learning models such as convolutional neural networks, can be integrated and further extended throughout the lifecycle of the LLMs to provide rigorous analysis to the safety and trustworthiness of LLMs and their applications. Specifically, we consider four complementary techniques: falsification and evaluation, verification, runtime monitoring, and ethical use. Considering the fast development of LLMs, this survey does not intend to be complete (although it includes 300 references), especially when it comes to the applications of LLMs in various domains, but rather a collection of organised literature reviews and discussions to support the quick understanding of the safety and trustworthiness issues from the perspective of V&V.
Abstract:For safety assurance of deep neural networks (DNNs), out-of-distribution (OoD) monitoring techniques are essential as they filter spurious input that is distant from the training dataset. This paper studies the problem of systematically testing OoD monitors to avoid cases where an input data point is tested as in-distribution by the monitor, but the DNN produces spurious output predictions. We consider the definition of "in-distribution" characterized in the feature space by a union of hyperrectangles learned from the training dataset. Thus the testing is reduced to finding corners in hyperrectangles distant from the available training data in the feature space. Concretely, we encode the abstract location of every data point as a finite-length binary string, and the union of all binary strings is stored compactly using binary decision diagrams (BDDs). We demonstrate how to use BDDs to symbolically extract corners distant from all data points within the training set. Apart from test case generation, we explain how to use the proposed corners to fine-tune the DNN to ensure that it does not predict overly confidently. The result is evaluated over examples such as number and traffic sign recognition.
Abstract:We present an approach for monitoring classification systems via data abstraction. Data abstraction relies on the notion of box with a resolution. Box-based abstraction consists in representing a set of values by its minimal and maximal values in each dimension. We augment boxes with a notion of resolution and define their clustering coverage, which is intuitively a quantitative metric that indicates the abstraction quality. This allows studying the effect of different clustering parameters on the constructed boxes and estimating an interval of sub-optimal parameters. Moreover, we automatically construct monitors that leverage both the correct and incorrect behaviors of a system. This allows checking the size of the monitor abstractions and analyzing the separability of the network. Monitors are obtained by combining the sub-monitors of each class of the system placed at some selected layers. Our experiments demonstrate the effectiveness of our clustering coverage estimation and show how to assess the effectiveness and precision of monitors according to the selected clustering parameter and monitored layers.
Abstract:Autonomous robots are complex systems that require the interaction and cooperation between numerous heterogeneous software components. In recent times, robots are being increasingly used for complex and safety-critical tasks, such as exploring Mars and assisting/replacing humans. Consequently, robots are becoming critical systems that must meet safety properties, in particular, logical, temporal and real-time constraints. To this end, we present an evolution of the LAAS architecture for autonomous systems, in particular its GenoM tool. This evolution relies on the BIP component-based design framework, which has been successfully used in other domains such as embedded systems. We show how we integrate BIP into our existing methodology for developing the lowest (functional) level of robots. Particularly, we discuss the componentization of the functional level, the synthesis of an execution controller for it, and how we verify whether the resulting functional level conforms to properties such as deadlock-freedom. We also show through experimentation that the verification is feasible and usable for complex, real world robotic systems, and that the BIP-based functional levels resulting from our new methodology are, despite an overhead during execution, still practical on real world robotic platforms. Our approach has been fully implemented in the LAAS architecture, and the implementation has been used in several experiments on a real robot.