Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

James Brusey

Learning from Less: SINDy Surrogates in RL

Apr 25, 2025

Aniket Dixit, Muhammad Ibrahim Khan, Faizan Ahmed, James Brusey

Abstract:This paper introduces an approach for developing surrogate environments in reinforcement learning (RL) using the Sparse Identification of Nonlinear Dynamics (SINDy) algorithm. We demonstrate the effectiveness of our approach through extensive experiments in OpenAI Gym environments, particularly Mountain Car and Lunar Lander. Our results show that SINDy-based surrogate models can accurately capture the underlying dynamics of these environments while reducing computational costs by 20-35%. With only 75 interactions for Mountain Car and 1000 for Lunar Lander, we achieve state-wise correlations exceeding 0.997, with mean squared errors as low as 3.11e-06 for Mountain Car velocity and 1.42e-06 for LunarLander position. RL agents trained in these surrogate environments require fewer total steps (65,075 vs. 100,000 for Mountain Car and 801,000 vs. 1,000,000 for Lunar Lander) while achieving comparable performance to those trained in the original environments, exhibiting similar convergence patterns and final performance metrics. This work contributes to the field of model-based RL by providing an efficient method for generating accurate, interpretable surrogate environments.

* World Models @ ICLR 2025

Via

Access Paper or Ask Questions

PyFlyt -- UAV Simulation Environments for Reinforcement Learning Research

Apr 03, 2023

Jun Jet Tai, Jim Wong, Mauro Innocente, Nadjim Horri, James Brusey, Swee King Phang

Abstract:Unmanned aerial vehicles (UAVs) have numerous applications, but their efficient and optimal flight can be a challenge. Reinforcement Learning (RL) has emerged as a promising approach to address this challenge, yet there is no standardized library for testing and benchmarking RL algorithms on UAVs. In this paper, we introduce PyFlyt, a platform built on the Bullet physics engine with native Gymnasium API support. PyFlyt provides modular implementations of simple components, such as motors and lifting surfaces, allowing for the implementation of UAVs of arbitrary configurations. Additionally, PyFlyt includes various task definitions and multiple reward function settings for each vehicle type. We demonstrate the effectiveness of PyFlyt by training various RL agents for two UAV models: quadrotor and fixed-wing. Our findings highlight the effectiveness of RL in UAV control and planning, and further show that it is possible to train agents in sparse reward settings for UAVs. PyFlyt fills a gap in existing literature by providing a flexible and standardised platform for testing RL algorithms on UAVs. We believe that this will inspire more standardised research in this direction.

* Under Review for Transactions on Robotics

Via

Access Paper or Ask Questions

Some Supervision Required: Incorporating Oracle Policies in Reinforcement Learning via Epistemic Uncertainty Metrics

Aug 22, 2022

Jun Jet Tai, Jordan K. Terry, Mauro S. Innocente, James Brusey, Nadjim Horri

Figure 1 for Some Supervision Required: Incorporating Oracle Policies in Reinforcement Learning via Epistemic Uncertainty Metrics

Figure 2 for Some Supervision Required: Incorporating Oracle Policies in Reinforcement Learning via Epistemic Uncertainty Metrics

Figure 3 for Some Supervision Required: Incorporating Oracle Policies in Reinforcement Learning via Epistemic Uncertainty Metrics

Figure 4 for Some Supervision Required: Incorporating Oracle Policies in Reinforcement Learning via Epistemic Uncertainty Metrics

Abstract:An inherent problem in reinforcement learning is coping with policies that are uncertain about what action to take (or the value of a state). Model uncertainty, more formally known as epistemic uncertainty, refers to the expected prediction error of a model beyond the sampling noise. In this paper, we propose a metric for epistemic uncertainty estimation in Q-value functions, which we term pathwise epistemic uncertainty. We further develop a method to compute its approximate upper bound, which we call F -value. We experimentally apply the latter to Deep Q-Networks (DQN) and show that uncertainty estimation in reinforcement learning serves as a useful indication of learning progress. We then propose a new approach to improving sample efficiency in actor-critic algorithms by learning from an existing (previously learned or hard-coded) oracle policy while uncertainty is high, aiming to avoid unproductive random actions during training. We term this Critic Confidence Guided Exploration (CCGE). We implement CCGE on Soft Actor-Critic (SAC) using our F-value metric, which we apply to a handful of popular Gym environments and show that it achieves better sample efficiency and total episodic reward than vanilla SAC in limited contexts.

* Under review at AAAI23

Via

Access Paper or Ask Questions

Differential radial basis function network for sequence modelling

Oct 13, 2020

Kojo Sarfo Gyamfi, James Brusey, Elena Gaura

Figure 1 for Differential radial basis function network for sequence modelling

Figure 2 for Differential radial basis function network for sequence modelling

Figure 3 for Differential radial basis function network for sequence modelling

Figure 4 for Differential radial basis function network for sequence modelling

Abstract:We propose a differential radial basis function (RBF) network termed RBF-DiffNet -- whose hidden layer blocks are partial differential equations (PDEs) linear in terms of the RBF -- to make the baseline RBF network robust to noise in sequential data. Assuming that the sequential data derives from the discretisation of the solution to an underlying PDE, the differential RBF network learns constant linear coefficients of the PDE, consequently regularising the RBF network by following modified backward-Euler updates. We experimentally validate the differential RBF network on the logistic map chaotic timeseries as well as on 30 real-world timeseries provided by Walmart in the M5 forecasting competition. The proposed model is compared with the normalised and unnormalised RBF networks, ARIMA, and ensembles of multilayer perceptrons (MLPs) and recurrent networks with long short-term memory (LSTM) blocks. From the experimental results, RBF-DiffNet consistently shows a marked reduction over the baseline RBF network in terms of the prediction error (e.g., 26% reduction in the root mean squared scaled error on the M5 dataset); RBF-DiffNet also shows a comparable performance to the LSTM ensemble at less than one-sixteenth the LSTM computational time. Our proposed network consequently enables more accurate predictions -- in the presence of observational noise -- in sequence modelling tasks such as timeseries forecasting that leverage the model interpretability, fast training, and function approximation properties of the RBF network.

Via

Access Paper or Ask Questions

Reinforcement Learning-based Thermal Comfort Control for Vehicle Cabins

Sep 05, 2017

James Brusey, Diana Hintea, Elena Gaura, Neil Beloe

Figure 1 for Reinforcement Learning-based Thermal Comfort Control for Vehicle Cabins

Figure 2 for Reinforcement Learning-based Thermal Comfort Control for Vehicle Cabins

Figure 3 for Reinforcement Learning-based Thermal Comfort Control for Vehicle Cabins

Figure 4 for Reinforcement Learning-based Thermal Comfort Control for Vehicle Cabins

Abstract:Vehicle climate control systems aim to keep passengers thermally comfortable. However, current systems control temperature rather than thermal comfort and tend to be energy hungry, which is of particular concern when considering electric vehicles. This paper poses energy-efficient vehicle comfort control as a Markov Decision Process, which is then solved numerically using Sarsa({\lambda}) and an empirically validated, single-zone, 1D thermal model of the cabin. The resulting controller was tested in simulation using 200 randomly selected scenarios and found to exceed the performance of bang-bang, proportional, simple fuzzy logic, and commercial controllers with 23%, 43%, 40%, 56% increase, respectively. Compared to the next best performing controller, energy consumption is reduced by 13% while the proportion of time spent thermally comfortable is increased by 23%. These results indicate that this is a viable approach that promises to translate into substantial comfort and energy improvements in the car.

Via

Access Paper or Ask Questions

K-Means Clustering using Tabu Search with Quantized Means

Mar 24, 2017

Kojo Sarfo Gyamfi, James Brusey, Andrew Hunt

Figure 1 for K-Means Clustering using Tabu Search with Quantized Means

Figure 2 for K-Means Clustering using Tabu Search with Quantized Means

Figure 3 for K-Means Clustering using Tabu Search with Quantized Means

Figure 4 for K-Means Clustering using Tabu Search with Quantized Means

Abstract:The Tabu Search (TS) metaheuristic has been proposed for K-Means clustering as an alternative to Lloyd's algorithm, which for all its ease of implementation and fast runtime, has the major drawback of being trapped at local optima. While the TS approach can yield superior performance, it involves a high computational complexity. Moreover, the difficulty in parameter selection in the existing TS approach does not make it any more attractive. This paper presents an alternative, low-complexity formulation of the TS optimization procedure for K-Means clustering. This approach does not require many parameter settings. We initially constrain the centers to points in the dataset. We then aim at evolving these centers using a unique neighborhood structure that makes use of gradient information of the objective function. This results in an efficient exploration of the search space, after which the means are refined. The proposed scheme is implemented in MATLAB and tested on four real-world datasets, and it achieves a significant improvement over the existing TS approach in terms of the intra cluster sum of squares and computational time.

* World Conference on Engineering and Computer Science

Via

Access Paper or Ask Questions

Linear classifier design under heteroscedasticity in Linear Discriminant Analysis

Mar 24, 2017

Kojo Sarfo Gyamfi, James Brusey, Andrew Hunt, Elena Gaura

Figure 1 for Linear classifier design under heteroscedasticity in Linear Discriminant Analysis

Figure 2 for Linear classifier design under heteroscedasticity in Linear Discriminant Analysis

Figure 3 for Linear classifier design under heteroscedasticity in Linear Discriminant Analysis

Figure 4 for Linear classifier design under heteroscedasticity in Linear Discriminant Analysis

Abstract:Under normality and homoscedasticity assumptions, Linear Discriminant Analysis (LDA) is known to be optimal in terms of minimising the Bayes error for binary classification. In the heteroscedastic case, LDA is not guaranteed to minimise this error. Assuming heteroscedasticity, we derive a linear classifier, the Gaussian Linear Discriminant (GLD), that directly minimises the Bayes error for binary classification. In addition, we also propose a local neighbourhood search (LNS) algorithm to obtain a more robust classifier if the data is known to have a non-normal distribution. We evaluate the proposed classifiers on two artificial and ten real-world datasets that cut across a wide range of application areas including handwriting recognition, medical diagnosis and remote sensing, and then compare our algorithm against existing LDA approaches and other linear classifiers. The GLD is shown to outperform the original LDA procedure in terms of the classification accuracy under heteroscedasticity. While it compares favourably with other existing heteroscedastic LDA approaches, the GLD requires as much as 60 times lower training time on some datasets. Our comparison with the support vector machine (SVM) also shows that, the GLD, together with the LNS, requires as much as 150 times lower training time to achieve an equivalent classification accuracy on some of the datasets. Thus, our algorithms can provide a cheap and reliable option for classification in a lot of expert systems.

Via

Access Paper or Ask Questions