Abstract:Despite tremendous progress, machine learning and deep learning still suffer from incomprehensible predictions. Incomprehensibility, however, is not an option for the use of (deep) reinforcement learning in the real world, as unpredictable actions can seriously harm the involved individuals. In this work, we propose a genetic programming framework to generate explanations for the decision-making process of already trained agents by imitating them with programs. Programs are interpretable and can be executed to generate explanations of why the agent chooses a particular action. Furthermore, we conduct an ablation study that investigates how extending the domain-specific language by using library learning alters the performance of the method. We compare our results with the previous state of the art for this problem and show that we are comparable in performance but require much less hardware resources and computation time.
Abstract:Understanding the interactions of agents trained with deep reinforcement learning is crucial for deploying agents in games or the real world. In the former, unreasonable actions confuse players. In the latter, that effect is even more significant, as unexpected behavior cause accidents with potentially grave and long-lasting consequences for the involved individuals. In this work, we propose using program synthesis to imitate reinforcement learning policies after seeing a trajectory of the action sequence. Programs have the advantage that they are inherently interpretable and verifiable for correctness. We adapt the state-of-the-art program synthesis system DreamCoder for learning concepts in grid-based environments, specifically, a navigation task and two miniature versions of Atari games, Space Invaders and Asterix. By inspecting the generated libraries, we can make inferences about the concepts the black-box agent has learned and better understand the agent's behavior. We achieve the same by visualizing the agent's decision-making process for the imitated sequences. We evaluate our approach with different types of program synthesizers based on a search-only method, a neural-guided search, and a language model fine-tuned on code.
Abstract:The balancing process for game levels in a competitive two-player context involves a lot of manual work and testing, particularly in non-symmetrical game levels. In this paper, we propose an architecture for automated balancing of tile-based levels within the recently introduced PCGRL framework (procedural content generation via reinforcement learning). Our architecture is divided into three parts: (1) a level generator, (2) a balancing agent and, (3) a reward modeling simulation. By playing the level in a simulation repeatedly, the balancing agent is rewarded for modifying it towards the same win rates for all players. To this end, we introduce a novel family of swap-based representations to increase robustness towards playability. We show that this approach is capable to teach an agent how to alter a level for balancing better and faster than plain PCGRL. In addition, by analyzing the agent's swapping behavior, we can draw conclusions about which tile types influence the balancing most. We test and show our results using the Neural MMO (NMMO) environment in a competitive two-player setting.