Abstract:Reinforcement learning (RL) is not yet competitive for many cyber-physical systems, such as robotics, process automation, and power systems, as training on a system with physical components cannot be accelerated, and simulation models do not exist or suffer from a large simulation-to-reality gap. During the long training time, expensive equipment cannot be used and might even be damaged due to inappropriate actions of the reinforcement learning agent. Our novel approach addresses exactly this problem: We train the reinforcement agent in a so-called shadow mode with the assistance of an existing conventional controller, which does not have to be trained and instantaneously performs reasonably well. In shadow mode, the agent relies on the controller to provide action samples and guidance towards favourable states to learn the task, while simultaneously estimating for which states the learned agent will receive a higher reward than the conventional controller. The RL agent will then control the system for these states and all other regions remain under the control of the existing controller. Over time, the RL agent will take over for an increasing amount of states, while leaving control to the baseline, where it cannot surpass its performance. Thus, we keep regret during training low and improve the performance compared to only using conventional controllers or reinforcement learning. We present and evaluate two mechanisms for deciding whether to use the RL agent or the conventional controller. The usefulness of our approach is demonstrated for a reach-avoid task, for which we are able to effectively train an agent, where standard approaches fail.
Abstract:Formal verification of robotic tasks requires a simple yet conformant model of the used robot. We present the first work on generating reachset conformant models for robotic contact tasks considering hybrid (mixed continuous and discrete) dynamics. Reachset conformance requires that the set of reachable outputs of the abstract model encloses all previous measurements to transfer safety properties. Aiming for industrial applications, we describe the system using a simple hybrid automaton with linear dynamics. We inject non-determinism into the continuous dynamics and the discrete transitions, and we optimally identify all model parameters together with the non-determinism required to capture the recorded behaviors. Using two 3-DOF robots, we show that our approach can effectively generate models to capture uncertainties in system behavior and substantially reduce the required testing effort in industrial applications.
Abstract:Reinforcement learning often uses neural networks to solve complex control tasks. However, neural networks are sensitive to input perturbations, which makes their deployment in safety-critical environments challenging. This work lifts recent results from formally verifying neural networks against such disturbances to reinforcement learning in continuous state and action spaces using reachability analysis. While previous work mainly focuses on adversarial attacks for robust reinforcement learning, we train neural networks utilizing entire sets of perturbed inputs and maximize the worst-case reward. The obtained agents are verifiably more robust than agents obtained by related work, making them more applicable in safety-critical environments. This is demonstrated with an extensive empirical evaluation of four different benchmarks.
Abstract:Adjusting robot behavior to human preferences can require intensive human feedback, preventing quick adaptation to new users and changing circumstances. Moreover, current approaches typically treat user preferences as a reward, which requires a manual balance between task success and user satisfaction. To integrate new user preferences in a zero-shot manner, our proposed Text2Interaction framework invokes large language models to generate a task plan, motion preferences as Python code, and parameters of a safe controller. By maximizing the combined probability of task completion and user satisfaction instead of a weighted sum of rewards, we can reliably find plans that fulfill both requirements. We find that 83% of users working with Text2Interaction agree that it integrates their preferences into the robot's plan, and 94% prefer Text2Interaction over the baseline. Our ablation study shows that Text2Interaction aligns better with unseen preferences than other baselines while maintaining a high success rate.
Abstract:Formal verification techniques play a pivotal role in ensuring the safety of complex cyber-physical systems. To transfer model-based verification results to the real world, we require that the measurements of the target system lie in the set of reachable outputs of the corresponding model, a property we refer to as reachset conformance. This paper is on automatically identifying those reachset-conformant models. While state-of-the-art reachset-conformant identification methods focus on linear state-space models, we generalize these methods to nonlinear state-space models and linear and nonlinear input-output models. Furthermore, our identification framework adapts to different levels of prior knowledge on the system dynamics. In particular, we identify the set of model uncertainties for white-box models, the parameters and the set of model uncertainties for gray-box models, and entire reachset-conformant black-box models from data. For the black-box identification, we propose a new genetic programming variant, which we call conformant genetic programming. The robustness and efficacy of our framework are demonstrated in extensive numerical experiments using simulated and real-world data.
Abstract:Continuous action spaces in reinforcement learning (RL) are commonly defined as interval sets. While intervals usually reflect the action boundaries for tasks well, they can be challenging for learning because the typically large global action space leads to frequent exploration of irrelevant actions. Yet, little task knowledge can be sufficient to identify significantly smaller state-specific sets of relevant actions. Focusing learning on these relevant actions can significantly improve training efficiency and effectiveness. In this paper, we propose to focus learning on the set of relevant actions and introduce three continuous action masking methods for exactly mapping the action space to the state-dependent set of relevant actions. Thus, our methods ensure that only relevant actions are executed, enhancing the predictability of the RL agent and enabling its use in safety-critical applications. We further derive the implications of the proposed methods on the policy gradient. Using Proximal Policy Optimization (PPO), we evaluate our methods on three control tasks, where the relevant action set is computed based on the system dynamics and a relevant state set. Our experiments show that the three action masking methods achieve higher final rewards and converge faster than the baseline without action masking.
Abstract:The growing complexity of power system management has led to an increased interest in the use of reinforcement learning (RL). However, no tool for comprehensive and realistic benchmarking of RL in smart grids exists. One prerequisite for such a comparison is a safeguarding mechanism since vanilla RL controllers can not guarantee the satisfaction of system constraints. Other central requirements include flexible modeling of benchmarking scenarios, credible baselines, and the possibility to investigate the impact of forecast uncertainties. Our Python tool CommonPower is the first modular framework addressing these needs. CommonPower offers a unified interface for single-agent and multi-agent RL training algorithms and includes a built-in model predictive control approach based on a symbolic representation of the system equations. This makes it possible to combine model predictive controllers with RL controllers in the same system. Leveraging the symbolic system model, CommonPower facilitates the study of safeguarding strategies via the flexible formulation of safety layers. Furthermore equipped with a generic forecasting interface, CommonPower constitutes a versatile tool significantly augmenting the exploration of safe RL controllers in smart grids on several dimensions.
Abstract:Graph neural networks are becoming increasingly popular in the field of machine learning due to their unique ability to process data structured in graphs. They have also been applied in safety-critical environments where perturbations inherently occur. However, these perturbations require us to formally verify neural networks before their deployment in safety-critical environments as neural networks are prone to adversarial attacks. While there exists research on the formal verification of neural networks, there is no work verifying the robustness of generic graph convolutional network architectures with uncertainty in the node features and in the graph structure over multiple message-passing steps. This work addresses this research gap by explicitly preserving the non-convex dependencies of all elements in the underlying computations through reachability analysis with (matrix) polynomial zonotopes. We demonstrate our approach on three popular benchmark datasets.
Abstract:Motion planners are essential for the safe operation of automated vehicles across various scenarios. However, no motion planning algorithm has achieved perfection in the literature, and improving its performance is often time-consuming and labor-intensive. To tackle the aforementioned issues, we present DrPlanner, the first framework designed to automatically diagnose and repair motion planners using large language models. Initially, we generate a structured description of the planner and its planned trajectories from both natural and programming languages. Leveraging the profound capabilities of large language models in addressing reasoning challenges, our framework returns repaired planners with detailed diagnostic descriptions. Furthermore, the framework advances iteratively with continuous feedback from the evaluation of the repaired outcomes. Our approach is validated using search-based motion planners; experimental results highlight the need of demonstrations in the prompt and the ability of our framework in identifying and rectifying elusive issues effectively.
Abstract:Autonomous vehicles have to obey traffic rules. These rules are often formalized using temporal logic, resulting in constraints that are hard to solve using optimization-based motion planners. Reinforcement Learning (RL) is a promising method to find motion plans adhering to temporal logic specifications. However, vanilla RL algorithms are based on random exploration, which is inherently unsafe. To address this issue, we propose a provably safe RL approach that always complies with traffic rules. As a specific application area, we consider vessels on the open sea, which must adhere to the Convention on the International Regulations for Preventing Collisions at Sea (COLREGS). We introduce an efficient verification approach that determines the compliance of actions with respect to the COLREGS formalized using temporal logic. Our action verification is integrated into the RL process so that the agent only selects verified actions. In contrast to agents that only integrate the traffic rule information in the reward function, our provably safe agent always complies with the formalized rules in critical maritime traffic situations and, thus, never causes a collision.