Abstract:Over the years, Reinforcement Learning (RL) established itself as a convenient paradigm to learn optimal policies from data. However, most RL algorithms achieve optimal policies by exploring all the possible actions and this, in real-world scenarios, is often infeasible or impractical due to e.g. safety constraints. Motivated by this, in this paper we propose to augment RL with Model Predictive Control (MPC), a popular model-based control algorithm that allows to optimally control a system while satisfying a set of constraints. The result is an algorithm, the MPC-augmented RL algorithm (MPCaRL) that makes use of MPC to both drive how RL explores the actions and to modify the corresponding rewards. We demonstrate the effectiveness of the MPCaRL by letting it play against the Atari game Pong. The results obtained highlight the ability of the algorithm to learn general tasks with essentially no training.