Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Harry Emerson

Flexible Blood Glucose Control: Offline Reinforcement Learning from Human Feedback

Jan 27, 2025

Harry Emerson, Sam Gordon James, Matthew Guy, Ryan McConville

Abstract:Reinforcement learning (RL) has demonstrated success in automating insulin dosing in simulated type 1 diabetes (T1D) patients but is currently unable to incorporate patient expertise and preference. This work introduces PAINT (Preference Adaptation for INsulin control in T1D), an original RL framework for learning flexible insulin dosing policies from patient records. PAINT employs a sketch-based approach for reward learning, where past data is annotated with a continuous reward signal to reflect patient's desired outcomes. Labelled data trains a reward model, informing the actions of a novel safety-constrained offline RL algorithm, designed to restrict actions to a safe strategy and enable preference tuning via a sliding scale. In-silico evaluation shows PAINT achieves common glucose goals through simple labelling of desired states, reducing glycaemic risk by 15% over a commercial benchmark. Action labelling can also be used to incorporate patient expertise, demonstrating an ability to pre-empt meals (+10% time-in-range post-meal) and address certain device errors (-1.6% variance post-error) with patient guidance. These results hold under realistic conditions, including limited samples, labelling errors, and intra-patient variability. This work illustrates PAINT's potential in real-world T1D management and more broadly any tasks requiring rapid and precise preference learning under safety constraints.

* 11 pages, 5 figures

Via

Access Paper or Ask Questions

CybORG++: An Enhanced Gym for the Development of Autonomous Cyber Agents

Oct 18, 2024

Harry Emerson, Liz Bates, Chris Hicks, Vasilios Mavroudis

Abstract:CybORG++ is an advanced toolkit for reinforcement learning research focused on network defence. Building on the CAGE 2 CybORG environment, it introduces key improvements, including enhanced debugging capabilities, refined agent implementation support, and a streamlined environment that enables faster training and easier customisation. Along with addressing several software bugs from its predecessor, CybORG++ introduces MiniCAGE, a lightweight version of CAGE 2, which improves performance dramatically, up to 1000x faster execution in parallel iterations, without sacrificing accuracy or core functionality. CybORG++ serves as a robust platform for developing and evaluating defensive agents, making it a valuable resource for advancing enterprise network defence research.

* 8 pages, 3 figures and included appendix

Via

Access Paper or Ask Questions

The Safety Challenges of Deep Learning in Real-World Type 1 Diabetes Management

Oct 23, 2023

Harry Emerson, Ryan McConville, Matthew Guy

Abstract:Blood glucose simulation allows the effectiveness of type 1 diabetes (T1D) management strategies to be evaluated without patient harm. Deep learning algorithms provide a promising avenue for extending simulator capabilities; however, these algorithms are limited in that they do not necessarily learn physiologically correct glucose dynamics and can learn incorrect and potentially dangerous relationships from confounders in training data. This is likely to be more important in real-world scenarios, as data is not collected under strict research protocol. This work explores the implications of using deep learning algorithms trained on real-world data to model glucose dynamics. Free-living data was processed from the OpenAPS Data Commons and supplemented with patient-reported tags of challenging diabetes events, constituting one of the most detailed real-world T1D datasets. This dataset was used to train and evaluate state-of-the-art glucose simulators, comparing their prediction error across safety critical scenarios and assessing the physiological appropriateness of the learned dynamics using Shapley Additive Explanations (SHAP). While deep learning prediction accuracy surpassed the widely-used mathematical simulator approach, the model deteriorated in safety critical scenarios and struggled to leverage self-reported meal and exercise information. SHAP value analysis also indicated the model had fundamentally confused the roles of insulin and carbohydrates, which is one of the most basic T1D management principles. This work highlights the importance of considering physiological appropriateness when using deep learning to model real-world systems in T1D and healthcare more broadly, and provides recommendations for building models that are robust to real-world data constraints.

* 15 pages, 3 figures

Via

Access Paper or Ask Questions

Offline Reinforcement Learning for Safer Blood Glucose Control in People with Type 1 Diabetes

Apr 07, 2022

Harry Emerson, Matt Guy, Ryan McConville

Figure 1 for Offline Reinforcement Learning for Safer Blood Glucose Control in People with Type 1 Diabetes

Figure 2 for Offline Reinforcement Learning for Safer Blood Glucose Control in People with Type 1 Diabetes

Figure 3 for Offline Reinforcement Learning for Safer Blood Glucose Control in People with Type 1 Diabetes

Figure 4 for Offline Reinforcement Learning for Safer Blood Glucose Control in People with Type 1 Diabetes

Abstract:Hybrid closed loop systems represent the future of care for people with type 1 diabetes (T1D). These devices usually utilise simple control algorithms to select the optimal insulin dose for maintaining blood glucose levels within a healthy range. Online reinforcement learning (RL) has been utilised as a method for further enhancing glucose control in these devices. Previous approaches have been shown to reduce patient risk and improve time spent in the target range when compared to classical control algorithms, but are prone to instability in the learning process, often resulting in the selection of unsafe actions. This work presents an evaluation of offline RL as a means for developing clinically effective dosing policies without the need for patient interaction. This paper examines the utility of BCQ, CQL and TD3-BC in managing the blood glucose of nine virtual patients within the UVA/Padova glucose dynamics simulator. When trained on less than a tenth of the data required by online RL approaches, this work shows that offline RL can significantly increase time in the healthy blood glucose range when compared to the strongest state-of-art baseline. This is achieved without any associated increase in low blood glucose events. Offline RL is also shown to be able to correct for common and challenging scenarios such as incorrect bolus dosing, irregular meal timings and sub-optimal training data.

* The code for this work is available at https://github.com/hemerson1/offline-glucose

Via

Access Paper or Ask Questions