Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Thomas Hubert

Optimizing Memory Mapping Using Deep Reinforcement Learning

May 11, 2023

Pengming Wang, Mikita Sazanovich, Berkin Ilbeyi, Phitchaya Mangpo Phothilimthana, Manish Purohit, Han Yang Tay, Ngân Vũ, Miaosen Wang, Cosmin Paduraru, Edouard Leurent(+8 more)

Figure 1 for Optimizing Memory Mapping Using Deep Reinforcement Learning

Figure 2 for Optimizing Memory Mapping Using Deep Reinforcement Learning

Figure 3 for Optimizing Memory Mapping Using Deep Reinforcement Learning

Figure 4 for Optimizing Memory Mapping Using Deep Reinforcement Learning

Abstract:Resource scheduling and allocation is a critical component of many high impact systems ranging from congestion control to cloud computing. Finding more optimal solutions to these problems often has significant impact on resource and time savings, reducing device wear-and-tear, and even potentially improving carbon emissions. In this paper, we focus on a specific instance of a scheduling problem, namely the memory mapping problem that occurs during compilation of machine learning programs: That is, mapping tensors to different memory layers to optimize execution time. We introduce an approach for solving the memory mapping problem using Reinforcement Learning. RL is a solution paradigm well-suited for sequential decision making problems that are amenable to planning, and combinatorial search spaces with high-dimensional data inputs. We formulate the problem as a single-player game, which we call the mallocGame, such that high-reward trajectories of the game correspond to efficient memory mappings on the target hardware. We also introduce a Reinforcement Learning agent, mallocMuZero, and show that it is capable of playing this game to discover new and improved memory mapping solutions that lead to faster execution times on real ML workloads on ML accelerators. We compare the performance of mallocMuZero to the default solver used by the Accelerated Linear Algebra (XLA) compiler on a benchmark of realistic ML workloads. In addition, we show that mallocMuZero is capable of improving the execution time of the recently published AlphaTensor matrix multiplication model.

Via

Access Paper or Ask Questions

MuZero with Self-competition for Rate Control in VP9 Video Compression

Feb 14, 2022

Amol Mandhane, Anton Zhernov, Maribeth Rauh, Chenjie Gu, Miaosen Wang, Flora Xue, Wendy Shang, Derek Pang, Rene Claus, Ching-Han Chiang(+9 more)

Figure 1 for MuZero with Self-competition for Rate Control in VP9 Video Compression

Figure 2 for MuZero with Self-competition for Rate Control in VP9 Video Compression

Figure 3 for MuZero with Self-competition for Rate Control in VP9 Video Compression

Figure 4 for MuZero with Self-competition for Rate Control in VP9 Video Compression

Abstract:Video streaming usage has seen a significant rise as entertainment, education, and business increasingly rely on online video. Optimizing video compression has the potential to increase access and quality of content to users, and reduce energy use and costs overall. In this paper, we present an application of the MuZero algorithm to the challenge of video compression. Specifically, we target the problem of learning a rate control policy to select the quantization parameters (QP) in the encoding process of libvpx, an open source VP9 video compression library widely used by popular video-on-demand (VOD) services. We treat this as a sequential decision making problem to maximize the video quality with an episodic constraint imposed by the target bitrate. Notably, we introduce a novel self-competition based reward mechanism to solve constrained RL with variable constraint satisfaction difficulty, which is challenging for existing constrained RL methods. We demonstrate that the MuZero-based rate control achieves an average 6.28% reduction in size of the compressed videos for the same delivered video quality level (measured as PSNR BD-rate) compared to libvpx's two-pass VBR rate control policy, while having better constraint satisfaction behavior.

Via

Access Paper or Ask Questions

Learning and Planning in Complex Action Spaces

Apr 13, 2021

Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Mohammadamin Barekatain, Simon Schmitt, David Silver

Figure 1 for Learning and Planning in Complex Action Spaces

Figure 2 for Learning and Planning in Complex Action Spaces

Figure 3 for Learning and Planning in Complex Action Spaces

Figure 4 for Learning and Planning in Complex Action Spaces

Abstract:Many important real-world problems have action spaces that are high-dimensional, continuous or both, making full enumeration of all possible actions infeasible. Instead, only small subsets of actions can be sampled for the purpose of policy evaluation and improvement. In this paper, we propose a general framework to reason in a principled way about policy evaluation and improvement over such sampled action subsets. This sample-based policy iteration framework can in principle be applied to any reinforcement learning algorithm based upon policy iteration. Concretely, we propose Sampled MuZero, an extension of the MuZero algorithm that is able to learn in domains with arbitrarily complex action spaces by planning over sampled actions. We demonstrate this approach on the classical board game of Go and on two continuous control benchmark domains: DeepMind Control Suite and Real-World RL Suite.

Via

Access Paper or Ask Questions

Online and Offline Reinforcement Learning by Planning with a Learned Model

Apr 13, 2021

Julian Schrittwieser, Thomas Hubert, Amol Mandhane, Mohammadamin Barekatain, Ioannis Antonoglou, David Silver

Figure 1 for Online and Offline Reinforcement Learning by Planning with a Learned Model

Figure 2 for Online and Offline Reinforcement Learning by Planning with a Learned Model

Figure 3 for Online and Offline Reinforcement Learning by Planning with a Learned Model

Figure 4 for Online and Offline Reinforcement Learning by Planning with a Learned Model

Abstract:Learning efficiently from small amounts of data has long been the focus of model-based reinforcement learning, both for the online case when interacting with the environment and the offline case when learning from a fixed dataset. However, to date no single unified algorithm could demonstrate state-of-the-art results in both settings. In this work, we describe the Reanalyse algorithm which uses model-based policy and value improvement operators to compute new improved training targets on existing data points, allowing efficient learning for data budgets varying by several orders of magnitude. We further show that Reanalyse can also be used to learn entirely from demonstrations without any environment interactions, as in the case of offline Reinforcement Learning (offline RL). Combining Reanalyse with the MuZero algorithm, we introduce MuZero Unplugged, a single unified algorithm for any data budget, including offline RL. In contrast to previous work, our algorithm does not require any special adaptations for the off-policy or offline RL settings. MuZero Unplugged sets new state-of-the-art results in the RL Unplugged offline RL benchmark as well as in the online RL benchmark of Atari in the standard 200 million frame setting.

Via

Access Paper or Ask Questions

Monte-Carlo Tree Search as Regularized Policy Optimization

Jul 24, 2020

Jean-Bastien Grill, Florent Altché, Yunhao Tang, Thomas Hubert, Michal Valko, Ioannis Antonoglou, Rémi Munos

Figure 1 for Monte-Carlo Tree Search as Regularized Policy Optimization

Figure 2 for Monte-Carlo Tree Search as Regularized Policy Optimization

Figure 3 for Monte-Carlo Tree Search as Regularized Policy Optimization

Figure 4 for Monte-Carlo Tree Search as Regularized Policy Optimization

Abstract:The combination of Monte-Carlo tree search (MCTS) with deep reinforcement learning has led to significant advances in artificial intelligence. However, AlphaZero, the current state-of-the-art MCTS algorithm, still relies on handcrafted heuristics that are only partially understood. In this paper, we show that AlphaZero's search heuristics, along with other common ones such as UCT, are an approximation to the solution of a specific regularized policy optimization problem. With this insight, we propose a variant of AlphaZero which uses the exact solution to this policy optimization problem, and show experimentally that it reliably outperforms the original algorithm in multiple domains.

* Accepted to International Conference on Machine Learning (ICML), 2020

Via

Access Paper or Ask Questions

Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Nov 19, 2019

Julian Schrittwieser, Ioannis Antonoglou, Thomas Hubert, Karen Simonyan, Laurent Sifre, Simon Schmitt, Arthur Guez, Edward Lockhart, Demis Hassabis, Thore Graepel(+2 more)

Figure 1 for Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Figure 2 for Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Figure 3 for Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Figure 4 for Mastering Atari, Go, Chess and Shogi by Planning with a Learned Model

Abstract:Constructing agents with planning capabilities has long been one of the main challenges in the pursuit of artificial intelligence. Tree-based planning methods have enjoyed huge success in challenging domains, such as chess and Go, where a perfect simulator is available. However, in real-world problems the dynamics governing the environment are often complex and unknown. In this work we present the MuZero algorithm which, by combining a tree-based search with a learned model, achieves superhuman performance in a range of challenging and visually complex domains, without any knowledge of their underlying dynamics. MuZero learns a model that, when applied iteratively, predicts the quantities most directly relevant to planning: the reward, the action-selection policy, and the value function. When evaluated on 57 different Atari games - the canonical video game environment for testing AI techniques, in which model-based planning approaches have historically struggled - our new algorithm achieved a new state of the art. When evaluated on Go, chess and shogi, without any knowledge of the game rules, MuZero matched the superhuman performance of the AlphaZero algorithm that was supplied with the game rules.

Via

Access Paper or Ask Questions

Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Dec 05, 2017

David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel(+3 more)

Figure 1 for Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Figure 2 for Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Figure 3 for Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Figure 4 for Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm

Abstract:The game of chess is the most widely-studied domain in the history of artificial intelligence. The strongest programs are based on a combination of sophisticated search techniques, domain-specific adaptations, and handcrafted evaluation functions that have been refined by human experts over several decades. In contrast, the AlphaGo Zero program recently achieved superhuman performance in the game of Go, by tabula rasa reinforcement learning from games of self-play. In this paper, we generalise this approach into a single AlphaZero algorithm that can achieve, tabula rasa, superhuman performance in many challenging domains. Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.

Via

Access Paper or Ask Questions