Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Francesco Corda

Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes

Jun 13, 2023

Luca Sabbioni, Francesco Corda, Marcello Restelli

Figure 1 for Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes

Figure 2 for Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes

Figure 3 for Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes

Figure 4 for Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes

Abstract:Policy-based algorithms are among the most widely adopted techniques in model-free RL, thanks to their strong theoretical groundings and good properties in continuous action spaces. Unfortunately, these methods require precise and problem-specific hyperparameter tuning to achieve good performance, and tend to struggle when asked to accomplish a series of heterogeneous tasks. In particular, the selection of the step size has a crucial impact on their ability to learn a highly performing policy, affecting the speed and the stability of the training process, and often being the main culprit for poor results. In this paper, we tackle these issues with a Meta Reinforcement Learning approach, by introducing a new formulation, known as meta-MDP, that can be used to solve any hyperparameter selection problem in RL with contextual processes. After providing a theoretical Lipschitz bound to the difference of performance in different tasks, we adopt the proposed framework to train a batch RL algorithm to dynamically recommend the most adequate step size for different policies and tasks. In conclusion, we present an experimental campaign to show the advantages of selecting an adaptive learning rate in heterogeneous environments.

Via

Access Paper or Ask Questions