Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alessandro Abate

University of Oxford

Certified Neural Approximations of Nonlinear Dynamics

May 21, 2025

Frederik Baymler Mathiesen, Nikolaus Vertovec, Francesco Fabiano, Luca Laurenti, Alessandro Abate

Abstract:Neural networks hold great potential to act as approximate models of nonlinear dynamical systems, with the resulting neural approximations enabling verification and control of such systems. However, in safety-critical contexts, the use of neural approximations requires formal bounds on their closeness to the underlying system. To address this fundamental challenge, we propose a novel, adaptive, and parallelizable verification method based on certified first-order models. Our approach provides formal error bounds on the neural approximations of dynamical systems, allowing them to be safely employed as surrogates by interpreting the error bound as bounded disturbances acting on the approximated dynamics. We demonstrate the effectiveness and scalability of our method on a range of established benchmarks from the literature, showing that it outperforms the state-of-the-art. Furthermore, we highlight the flexibility of our framework by applying it to two novel scenarios not previously explored in this context: neural network compression and an autoencoder-based deep learning architecture for learning Koopman operators, both yielding compelling results.

* first and second author contributed equally

Via

Access Paper or Ask Questions

SPoRt -- Safe Policy Ratio: Certified Training and Deployment of Task Policies in Model-Free RL

Apr 08, 2025

Jacques Cloete, Nikolaus Vertovec, Alessandro Abate

Abstract:To apply reinforcement learning to safety-critical applications, we ought to provide safety guarantees during both policy training and deployment. In this work we present novel theoretical results that provide a bound on the probability of violating a safety property for a new task-specific policy in a model-free, episodic setup: the bound, based on a `maximum policy ratio' that is computed with respect to a `safe' base policy, can also be more generally applied to temporally-extended properties (beyond safety) and to robust control problems. We thus present SPoRt, which also provides a data-driven approach for obtaining such a bound for the base policy, based on scenario theory, and which includes Projected PPO, a new projection-based approach for training the task-specific policy while maintaining a user-specified bound on property violation. Hence, SPoRt enables the user to trade off safety guarantees in exchange for task-specific performance. Accordingly, we present experimental results demonstrating this trade-off, as well as a comparison of the theoretical bound to posterior bounds based on empirical violation rates.

* 9 pages + 16 pages supplementary material, 3 figures + 6 figures supplementary material

Via

Access Paper or Ask Questions

Certified Approximate Reachability (CARe): Formal Error Bounds on Deep Learning of Reachable Sets

Mar 31, 2025

Prashant Solanki, Nikolaus Vertovec, Yannik Schnitzer, Jasper Van Beers, Coen de Visser, Alessandro Abate

Abstract:Recent approaches to leveraging deep learning for computing reachable sets of continuous-time dynamical systems have gained popularity over traditional level-set methods, as they overcome the curse of dimensionality. However, as with level-set methods, considerable care needs to be taken in limiting approximation errors, particularly since no guarantees are provided during training on the accuracy of the learned reachable set. To address this limitation, we introduce an epsilon-approximate Hamilton-Jacobi Partial Differential Equation (HJ-PDE), which establishes a relationship between training loss and accuracy of the true reachable set. To formally certify this approximation, we leverage Satisfiability Modulo Theories (SMT) solvers to bound the residual error of the HJ-based loss function across the domain of interest. Leveraging Counter Example Guided Inductive Synthesis (CEGIS), we close the loop around learning and verification, by fine-tuning the neural network on counterexamples found by the SMT solver, thus improving the accuracy of the learned reachable set. To the best of our knowledge, Certified Approximate Reachability (CARe) is the first approach to provide soundness guarantees on learned reachable sets of continuous dynamical systems.

Via

Access Paper or Ask Questions

Modular Training of Neural Networks aids Interpretability

Feb 04, 2025

Satvik Golechha, Maheep Chaudhary, Joan Velja, Alessandro Abate, Nandi Schoots

Figure 1 for Modular Training of Neural Networks aids Interpretability

Figure 2 for Modular Training of Neural Networks aids Interpretability

Figure 3 for Modular Training of Neural Networks aids Interpretability

Figure 4 for Modular Training of Neural Networks aids Interpretability

Abstract:An approach to improve neural network interpretability is via clusterability, i.e., splitting a model into disjoint clusters that can be studied independently. We define a measure for clusterability and show that pre-trained models form highly enmeshed clusters via spectral graph clustering. We thus train models to be more modular using a ``clusterability loss'' function that encourages the formation of non-interacting clusters. Using automated interpretability techniques, we show that our method can help train models that are more modular and learn different, disjoint, and smaller circuits. We investigate CNNs trained on MNIST and CIFAR, small transformers trained on modular addition, and language models. Our approach provides a promising direction for training neural networks that learn simpler functions and are easier to interpret.

* 8 pages, under review. arXiv admin note: text overlap with arXiv:2409.15747

Via

Access Paper or Ask Questions

Subversion Strategy Eval: Evaluating AI's stateless strategic capabilities against control protocols

Dec 17, 2024

Alex Mallen, Charlie Griffin, Alessandro Abate, Buck Shlegeris

Figure 1 for Subversion Strategy Eval: Evaluating AI's stateless strategic capabilities against control protocols

Figure 2 for Subversion Strategy Eval: Evaluating AI's stateless strategic capabilities against control protocols

Figure 3 for Subversion Strategy Eval: Evaluating AI's stateless strategic capabilities against control protocols

Figure 4 for Subversion Strategy Eval: Evaluating AI's stateless strategic capabilities against control protocols

Abstract:AI control protocols are plans for usefully deploying AI systems in a way that is safe, even if the AI intends to subvert the protocol. Previous work evaluated protocols by subverting them with a human-AI red team, where an AI follows the human-written strategy. This paper investigates how well AI systems can generate and act on strategies for subverting control protocols whilst operating without private memory. Usually, decision-makers only need the ability to generate optimal plans. However, to subvert a protocol, an AI system must be able to reliably generate optimal plans in each context; coordinate plans with other instances of itself without communicating; and take actions with well-calibrated probabilities. We develop evaluations for a model's ability to generate subversion strategies and develop affordance experiments that help isolate component capabilities. Our initial evaluations with Claude-3.5-sonnet highlight reasons why statelessly strategizing to subvert control protocols might be hard. We hope that our work can serve as a starting point for evaluations aimed at assessing whether a model is capable of strategizing to subvert control protocols.

* Accepted at AAAI datasafe workshop

Via

Access Paper or Ask Questions

Partial Identifiability in Inverse Reinforcement Learning For Agents With Non-Exponential Discounting

Dec 15, 2024

Joar Skalse, Alessandro Abate

Abstract:The aim of inverse reinforcement learning (IRL) is to infer an agent's preferences from observing their behaviour. Usually, preferences are modelled as a reward function, $R$, and behaviour is modelled as a policy, $\pi$. One of the central difficulties in IRL is that multiple preferences may lead to the same observed behaviour. That is, $R$ is typically underdetermined by $\pi$, which means that $R$ is only partially identifiable. Recent work has characterised the extent of this partial identifiability for different types of agents, including optimal and Boltzmann-rational agents. However, work so far has only considered agents that discount future reward exponentially: this is a serious limitation, especially given that extensive work in the behavioural sciences suggests that humans are better modelled as discounting hyperbolically. In this work, we newly characterise partial identifiability in IRL for agents with non-exponential discounting: our results are in particular relevant for hyperbolical discounting, but they also more generally apply to agents that use other types of (non-exponential) discounting. We significantly show that generally IRL is unable to infer enough information about $R$ to identify the correct optimal policy, which entails that IRL alone can be insufficient to adequately characterise the preferences of such agents.

Via

Access Paper or Ask Questions

Risk-Averse Certification of Bayesian Neural Networks

Nov 29, 2024

Xiyue Zhang, Zifan Wang, Yulong Gao, Licio Romao, Alessandro Abate, Marta Kwiatkowska

Figure 1 for Risk-Averse Certification of Bayesian Neural Networks

Figure 2 for Risk-Averse Certification of Bayesian Neural Networks

Figure 3 for Risk-Averse Certification of Bayesian Neural Networks

Figure 4 for Risk-Averse Certification of Bayesian Neural Networks

Abstract:In light of the inherently complex and dynamic nature of real-world environments, incorporating risk measures is crucial for the robustness evaluation of deep learning models. In this work, we propose a Risk-Averse Certification framework for Bayesian neural networks called RAC-BNN. Our method leverages sampling and optimisation to compute a sound approximation of the output set of a BNN, represented using a set of template polytopes. To enhance robustness evaluation, we integrate a coherent distortion risk measure--Conditional Value at Risk (CVaR)--into the certification framework, providing probabilistic guarantees based on empirical distributions obtained through sampling. We validate RAC-BNN on a range of regression and classification benchmarks and compare its performance with a state-of-the-art method. The results show that RAC-BNN effectively quantifies robustness under worst-performing risky scenarios, and achieves tighter certified bounds and higher efficiency in complex tasks.

Via

Access Paper or Ask Questions

Partial Identifiability and Misspecification in Inverse Reinforcement Learning

Nov 24, 2024

Joar Skalse, Alessandro Abate

Abstract:The aim of Inverse Reinforcement Learning (IRL) is to infer a reward function $R$ from a policy $\pi$. This problem is difficult, for several reasons. First of all, there are typically multiple reward functions which are compatible with a given policy; this means that the reward function is only *partially identifiable*, and that IRL contains a certain fundamental degree of ambiguity. Secondly, in order to infer $R$ from $\pi$, an IRL algorithm must have a *behavioural model* of how $\pi$ relates to $R$. However, the true relationship between human preferences and human behaviour is very complex, and practically impossible to fully capture with a simple model. This means that the behavioural model in practice will be *misspecified*, which raises the worry that it might lead to unsound inferences if applied to real-world data. In this paper, we provide a comprehensive mathematical analysis of partial identifiability and misspecification in IRL. Specifically, we fully characterise and quantify the ambiguity of the reward function for all of the behavioural models that are most common in the current IRL literature. We also provide necessary and sufficient conditions that describe precisely how the observed demonstrator policy may differ from each of the standard behavioural models before that model leads to faulty inferences about the reward function $R$. In addition to this, we introduce a cohesive framework for reasoning about partial identifiability and misspecification in IRL, together with several formal tools that can be used to easily derive the partial identifiability and misspecification robustness of new IRL models, or analyse other kinds of reward learning algorithms.

Via

Access Paper or Ask Questions

Temporal-Difference Variational Continual Learning

Oct 10, 2024

Luckeciano C. Melo, Alessandro Abate, Yarin Gal

Figure 1 for Temporal-Difference Variational Continual Learning

Figure 2 for Temporal-Difference Variational Continual Learning

Figure 3 for Temporal-Difference Variational Continual Learning

Figure 4 for Temporal-Difference Variational Continual Learning

Abstract:A crucial capability of Machine Learning models in real-world applications is the ability to continuously learn new tasks. This adaptability allows them to respond to potentially inevitable shifts in the data-generating distribution over time. However, in Continual Learning (CL) settings, models often struggle to balance learning new tasks (plasticity) with retaining previous knowledge (memory stability). Consequently, they are susceptible to Catastrophic Forgetting, which degrades performance and undermines the reliability of deployed systems. Variational Continual Learning methods tackle this challenge by employing a learning objective that recursively updates the posterior distribution and enforces it to stay close to the latest posterior estimate. Nonetheless, we argue that these methods may be ineffective due to compounding approximation errors over successive recursions. To mitigate this, we propose new learning objectives that integrate the regularization effects of multiple previous posterior estimations, preventing individual errors from dominating future posterior updates and compounding over time. We reveal insightful connections between these objectives and Temporal-Difference methods, a popular learning mechanism in Reinforcement Learning and Neuroscience. We evaluate the proposed objectives on challenging versions of popular CL benchmarks, demonstrating that they outperform standard Variational CL methods and non-variational baselines, effectively alleviating Catastrophic Forgetting.

Via

Access Paper or Ask Questions

DeepLTL: Learning to Efficiently Satisfy Complex LTL Specifications

Oct 06, 2024

Mathias Jackermeier, Alessandro Abate

Abstract:Linear temporal logic (LTL) has recently been adopted as a powerful formalism for specifying complex, temporally extended tasks in reinforcement learning (RL). However, learning policies that efficiently satisfy arbitrary specifications not observed during training remains a challenging problem. Existing approaches suffer from several shortcomings: they are often only applicable to finite-horizon fragments of LTL, are restricted to suboptimal solutions, and do not adequately handle safety constraints. In this work, we propose a novel learning approach to address these concerns. Our method leverages the structure of B\"uchi automata, which explicitly represent the semantics of LTL specifications, to learn policies conditioned on sequences of truth assignments that lead to satisfying the desired formulae. Experiments in a variety of discrete and continuous domains demonstrate that our approach is able to zero-shot satisfy a wide range of finite- and infinite-horizon specifications, and outperforms existing methods in terms of both satisfaction probability and efficiency.

Via

Access Paper or Ask Questions