Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Abdullah Akgül

Improving Plasticity in Non-stationary Reinforcement Learning with Evidential Proximal Policy Optimization

Mar 03, 2025

Abdullah Akgül, Gulcin Baykal, Manuel Haußmann, Melih Kandemir

Abstract:On-policy reinforcement learning algorithms use the most recently learned policy to interact with the environment and update it using the latest gathered trajectories, making them well-suited for adapting to non-stationary environments where dynamics change over time. However, previous studies show that they struggle to maintain plasticity$\unicode{x2013}$the ability of neural networks to adjust their synaptic connections$\unicode{x2013}$with overfitting identified as the primary cause. To address this, we present the first application of evidential learning in an on-policy reinforcement learning setting: $\textit{Evidential Proximal Policy Optimization (EPPO)}$. EPPO incorporates all sources of error in the critic network's approximation$\unicode{x2013}$i.e., the baseline function in advantage calculation$\unicode{x2013}$by modeling the epistemic and aleatoric uncertainty contributions to the approximation's total variance. We achieve this by using an evidential neural network, which serves as a regularizer to prevent overfitting. The resulting probabilistic interpretation of the advantage function enables optimistic exploration, thus maintaining the plasticity. Through experiments on non-stationary continuous control tasks, where the environment dynamics change at regular intervals, we demonstrate that EPPO outperforms state-of-the-art on-policy reinforcement learning variants in both task-specific and overall return.

Via

Access Paper or Ask Questions

Deterministic Uncertainty Propagation for Improved Model-Based Offline Reinforcement Learning

Jun 06, 2024

Abdullah Akgül, Manuel Haußmann, Melih Kandemir

Abstract:Current approaches to model-based offline Reinforcement Learning (RL) often incorporate uncertainty-based reward penalization to address the distributional shift problem. While these approaches have achieved some success, we argue that this penalization introduces excessive conservatism, potentially resulting in suboptimal policies through underestimation. We identify as an important cause of over-penalization the lack of a reliable uncertainty estimator capable of propagating uncertainties in the Bellman operator. The common approach to calculating the penalty term relies on sampling-based uncertainty estimation, resulting in high variance. To address this challenge, we propose a novel method termed Moment Matching Offline Model-Based Policy Optimization (MOMBO). MOMBO learns a Q-function using moment matching, which allows us to deterministically propagate uncertainties through the Q-function. We evaluate MOMBO's performance across various environments and demonstrate empirically that MOMBO is a more stable and sample-efficient approach.

Via

Access Paper or Ask Questions

Calibrating Bayesian UNet++ for Sub-Seasonal Forecasting

Apr 04, 2024

Busra Asan, Abdullah Akgül, Alper Unal, Melih Kandemir, Gozde Unal

Abstract:Seasonal forecasting is a crucial task when it comes to detecting the extreme heat and colds that occur due to climate change. Confidence in the predictions should be reliable since a small increase in the temperatures in a year has a big impact on the world. Calibration of the neural networks provides a way to ensure our confidence in the predictions. However, calibrating regression models is an under-researched topic, especially in forecasters. We calibrate a UNet++ based architecture, which was shown to outperform physics-based models in temperature anomalies. We show that with a slight trade-off between prediction error and calibration error, it is possible to get more reliable and sharper forecasts. We believe that calibration should be an important part of safety-critical machine learning applications such as weather forecasters.

* Accepted as a workshop paper at "ICLR 2024 Tackling Climate Change with Machine Learning"

Via

Access Paper or Ask Questions

BOF-UCB: A Bayesian-Optimistic Frequentist Algorithm for Non-Stationary Contextual Bandits

Jul 19, 2023

Nicklas Werge, Abdullah Akgül, Melih Kandemir

Abstract:We propose a novel Bayesian-Optimistic Frequentist Upper Confidence Bound (BOF-UCB) algorithm for stochastic contextual linear bandits in non-stationary environments. This unique combination of Bayesian and frequentist principles enhances adaptability and performance in dynamic settings. The BOF-UCB algorithm utilizes sequential Bayesian updates to infer the posterior distribution of the unknown regression parameter, and subsequently employs a frequentist approach to compute the Upper Confidence Bound (UCB) by maximizing the expected reward over the posterior distribution. We provide theoretical guarantees of BOF-UCB's performance and demonstrate its effectiveness in balancing exploration and exploitation on synthetic datasets and classical control tasks in a reinforcement learning setting. Our results show that BOF-UCB outperforms existing methods, making it a promising solution for sequential decision-making in non-stationary environments.

Via

Access Paper or Ask Questions

PAC-Bayesian Soft Actor-Critic Learning

Jan 30, 2023

Bahareh Tasdighi, Abdullah Akgül, Kenny Kazimirzak Brink, Melih Kandemir

Abstract:Actor-critic algorithms address the dual goals of reinforcement learning, policy evaluation and improvement, via two separate function approximators. The practicality of this approach comes at the expense of training instability, caused mainly by the destructive effect of the approximation errors of the critic on the actor. We tackle this bottleneck by employing an existing Probably Approximately Correct (PAC) Bayesian bound for the first time as the critic training objective of the Soft Actor-Critic (SAC) algorithm. We further demonstrate that the online learning performance improves significantly when a stochastic actor explores multiple futures by critic-guided random search. We observe our resulting algorithm to compare favorably to the state of the art on multiple classical control and locomotion tasks in both sample efficiency and asymptotic performance.

* 12 pages, 3 figures

Via

Access Paper or Ask Questions

How to Combine Variational Bayesian Networks in Federated Learning

Jun 22, 2022

Atahan Ozer, Kadir Burak Buldu, Abdullah Akgül, Gozde Unal

Figure 1 for How to Combine Variational Bayesian Networks in Federated Learning

Figure 2 for How to Combine Variational Bayesian Networks in Federated Learning

Figure 3 for How to Combine Variational Bayesian Networks in Federated Learning

Figure 4 for How to Combine Variational Bayesian Networks in Federated Learning

Abstract:Federated Learning enables multiple data centers to train a central model collaboratively without exposing any confidential data. Even though deterministic models are capable of performing high prediction accuracy, their lack of calibration and capability to quantify uncertainty is problematic for safety-critical applications. Different from deterministic models, probabilistic models such as Bayesian neural networks are relatively well-calibrated and able to quantify uncertainty alongside their competitive prediction accuracy. Both of the approaches appear in the federated learning framework; however, the aggregation scheme of deterministic models cannot be directly applied to probabilistic models since weights correspond to distributions instead of point estimates. In this work, we study the effects of various aggregation schemes for variational Bayesian neural networks. With empirical results on three image classification datasets, we observe that the degree of spread for an aggregated distribution is a significant factor in the learning process. Hence, we present an investigation on the question of how to combine variational Bayesian networks in federated learning, while providing benchmarks for different aggregation settings.

Via

Access Paper or Ask Questions

Continual Learning of Multi-modal Dynamics with External Memory

Mar 02, 2022

Abdullah Akgül, Gozde Unal, Melih Kandemir

Figure 1 for Continual Learning of Multi-modal Dynamics with External Memory

Figure 2 for Continual Learning of Multi-modal Dynamics with External Memory

Figure 3 for Continual Learning of Multi-modal Dynamics with External Memory

Figure 4 for Continual Learning of Multi-modal Dynamics with External Memory

Abstract:We study the problem of fitting a model to a dynamical environment when new modes of behavior emerge sequentially. The learning model is aware when a new mode appears, but it does not have access to the true modes of individual training sequences. We devise a novel continual learning method that maintains a descriptor of the mode of an encountered sequence in a neural episodic memory. We employ a Dirichlet Process prior on the attention weights of the memory to foster efficient storage of the mode descriptors. Our method performs continual learning by transferring knowledge across tasks by retrieving the descriptors of similar modes of past tasks to the mode of a current sequence and feeding this descriptor into its transition kernel as control input. We observe the continual learning performance of our method to compare favorably to the mainstream parameter transfer approach.

Via

Access Paper or Ask Questions

Evidential Turing Processes

Jun 02, 2021

Melih Kandemir, Abdullah Akgül, Manuel Haussmann, Gozde Unal

Figure 1 for Evidential Turing Processes

Figure 2 for Evidential Turing Processes

Figure 3 for Evidential Turing Processes

Figure 4 for Evidential Turing Processes

Abstract:A probabilistic classifier with reliable predictive uncertainties i) fits successfully to the target domain data, ii) provides calibrated class probabilities in difficult regions of the target domain (e.g. class overlap), and iii) accurately identifies queries coming out of the target domain and reject them. We introduce an original combination of evidential deep learning, neural processes, and neural Turing machines capable of providing all three essential properties mentioned above for total uncertainty quantification. We observe our method on three image classification benchmarks and two neural net architectures to consistently give competitive or superior scores with respect to multiple uncertainty quantification metrics against state-of-the-art methods explicitly tailored to one or a few of them. Our unified solution delivers an implementation-friendly and computationally efficient recipe for safety clearance and provides intellectual economy to an investigation of algorithmic roots of epistemic awareness in deep neural nets.

Via

Access Paper or Ask Questions