Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Rein Houthooft

Evolved Policy Gradients

Apr 29, 2018

Rein Houthooft, Richard Y. Chen, Phillip Isola, Bradly C. Stadie, Filip Wolski, Jonathan Ho, Pieter Abbeel

Abstract:We propose a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms. The idea is to evolve a differentiable loss function, such that an agent, which optimizes its policy to minimize this loss, will achieve high rewards. The loss is parametrized via temporal convolutions over the agent's experience. Because this loss is highly flexible in its ability to take into account the agent's history, it enables fast task learning. Empirical results show that our evolved policy gradient algorithm (EPG) achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method. We also demonstrate that EPG's learned loss can generalize to out-of-distribution test time tasks, and exhibits qualitatively different behavior from other popular metalearning algorithms.

Via

Access Paper or Ask Questions

Some Considerations on Learning to Explore via Meta-Reinforcement Learning

Mar 03, 2018

Bradly C. Stadie, Ge Yang, Rein Houthooft, Xi Chen, Yan Duan, Yuhuai Wu, Pieter Abbeel, Ilya Sutskever

Figure 1 for Some Considerations on Learning to Explore via Meta-Reinforcement Learning

Figure 2 for Some Considerations on Learning to Explore via Meta-Reinforcement Learning

Figure 3 for Some Considerations on Learning to Explore via Meta-Reinforcement Learning

Figure 4 for Some Considerations on Learning to Explore via Meta-Reinforcement Learning

Abstract:We consider the problem of exploration in meta reinforcement learning. Two new meta reinforcement learning algorithms are suggested: E-MAML and E-$\text{RL}^2$. Results are presented on a novel environment we call `Krazy World' and a set of maze environments. We show E-MAML and E-$\text{RL}^2$ deliver better performance on tasks where exploration is important.

Via

Access Paper or Ask Questions

Parameter Space Noise for Exploration

Jan 31, 2018

Matthias Plappert, Rein Houthooft, Prafulla Dhariwal, Szymon Sidor, Richard Y. Chen, Xi Chen, Tamim Asfour, Pieter Abbeel, Marcin Andrychowicz

Figure 1 for Parameter Space Noise for Exploration

Figure 2 for Parameter Space Noise for Exploration

Figure 3 for Parameter Space Noise for Exploration

Figure 4 for Parameter Space Noise for Exploration

Abstract:Deep reinforcement learning (RL) methods generally engage in exploratory behavior through noise injection in the action space. An alternative is to add noise directly to the agent's parameters, which can lead to more consistent exploration and a richer set of behaviors. Methods such as evolutionary strategies use parameter perturbations, but discard all temporal structure in the process and require significantly more samples. Combining parameter noise with traditional RL methods allows to combine the best of both worlds. We demonstrate that both off- and on-policy methods benefit from this approach through experimental comparison of DQN, DDPG, and TRPO on high-dimensional discrete action environments as well as continuous control tasks. Our results show that RL with parameter noise learns more efficiently than traditional RL with action space noise and evolutionary strategies individually.

* Updated to camera-ready ICLR submission

Via

Access Paper or Ask Questions

#Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

Dec 05, 2017

Haoran Tang, Rein Houthooft, Davis Foote, Adam Stooke, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel

Figure 1 for #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

Figure 2 for #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

Figure 3 for #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

Figure 4 for #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning

Abstract:Count-based exploration algorithms are known to perform near-optimally when used in conjunction with tabular reinforcement learning (RL) methods for solving small discrete Markov decision processes (MDPs). It is generally thought that count-based methods cannot be applied in high-dimensional state spaces, since most states will only occur once. Recent deep RL exploration strategies are able to deal with high-dimensional continuous state spaces through complex heuristics, often relying on optimism in the face of uncertainty or intrinsic motivation. In this work, we describe a surprising finding: a simple generalization of the classic count-based approach can reach near state-of-the-art performance on various high-dimensional and/or continuous deep RL benchmarks. States are mapped to hash codes, which allows to count their occurrences with a hash table. These counts are then used to compute a reward bonus according to the classic count-based exploration theory. We find that simple hash functions can achieve surprisingly good results on many challenging tasks. Furthermore, we show that a domain-dependent learned hash code may further improve these results. Detailed analysis reveals important aspects of a good hash function: 1) having appropriate granularity and 2) encoding information relevant to solving the MDP. This exploration strategy achieves near state-of-the-art performance on both continuous control tasks and Atari 2600 games, hence providing a simple yet powerful baseline for solving MDPs that require considerable exploration.

* 10 pages main text + 10 pages supplementary. Published at NIPS 2017

Via

Access Paper or Ask Questions

VIME: Variational Information Maximizing Exploration

Jan 27, 2017

Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel

Figure 1 for VIME: Variational Information Maximizing Exploration

Figure 2 for VIME: Variational Information Maximizing Exploration

Figure 3 for VIME: Variational Information Maximizing Exploration

Figure 4 for VIME: Variational Information Maximizing Exploration

Abstract:Scalable and effective exploration remains a key challenge in reinforcement learning (RL). While there are methods with optimality guarantees in the setting of discrete state and action spaces, these methods cannot be applied in high-dimensional deep RL scenarios. As such, most contemporary RL relies on simple heuristics such as epsilon-greedy exploration or adding Gaussian noise to the controls. This paper introduces Variational Information Maximizing Exploration (VIME), an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics. We propose a practical implementation, using variational inference in Bayesian neural networks which efficiently handles continuous state and action spaces. VIME modifies the MDP reward function, and can be applied with several different underlying RL algorithms. We demonstrate that VIME achieves significantly better performance compared to heuristic exploration methods across a variety of continuous control tasks and algorithms, including tasks with very sparse rewards.

* Published in Advances in Neural Information Processing Systems 29 (NIPS), pages 1109-1117

Via

Access Paper or Ask Questions

InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Jun 12, 2016

Xi Chen, Yan Duan, Rein Houthooft, John Schulman, Ilya Sutskever, Pieter Abbeel

Figure 1 for InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Figure 2 for InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Figure 3 for InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Figure 4 for InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets

Abstract:This paper describes InfoGAN, an information-theoretic extension to the Generative Adversarial Network that is able to learn disentangled representations in a completely unsupervised manner. InfoGAN is a generative adversarial network that also maximizes the mutual information between a small subset of the latent variables and the observation. We derive a lower bound to the mutual information objective that can be optimized efficiently, and show that our training procedure can be interpreted as a variation of the Wake-Sleep algorithm. Specifically, InfoGAN successfully disentangles writing styles from digit shapes on the MNIST dataset, pose from lighting of 3D rendered images, and background digits from the central digit on the SVHN dataset. It also discovers visual concepts that include hair styles, presence/absence of eyeglasses, and emotions on the CelebA face dataset. Experiments show that InfoGAN learns interpretable representations that are competitive with representations learned by existing fully supervised methods.

Via

Access Paper or Ask Questions

Benchmarking Deep Reinforcement Learning for Continuous Control

May 27, 2016

Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel

Figure 1 for Benchmarking Deep Reinforcement Learning for Continuous Control

Figure 2 for Benchmarking Deep Reinforcement Learning for Continuous Control

Figure 3 for Benchmarking Deep Reinforcement Learning for Continuous Control

Figure 4 for Benchmarking Deep Reinforcement Learning for Continuous Control

Abstract:Recently, researchers have made significant progress combining the advances in deep learning for learning feature representations with reinforcement learning. Some notable examples include training agents to play Atari games based on raw pixel data and to acquire advanced manipulation skills using raw sensory inputs. However, it has been difficult to quantify progress in the domain of continuous control due to the lack of a commonly adopted benchmark. In this work, we present a benchmark suite of continuous control tasks, including classic tasks like cart-pole swing-up, tasks with very high state and action dimensionality such as 3D humanoid locomotion, tasks with partial observations, and tasks with hierarchical structure. We report novel findings based on the systematic evaluation of a range of implemented reinforcement learning algorithms. Both the benchmark and reference implementations are released at https://github.com/rllab/rllab in order to facilitate experimental reproducibility and to encourage adoption by other researchers.

* 14 pages, ICML 2016

Via

Access Paper or Ask Questions

Integrated Inference and Learning of Neural Factors in Structural Support Vector Machines

Mar 03, 2016

Rein Houthooft, Filip De Turck

Figure 1 for Integrated Inference and Learning of Neural Factors in Structural Support Vector Machines

Figure 2 for Integrated Inference and Learning of Neural Factors in Structural Support Vector Machines

Figure 3 for Integrated Inference and Learning of Neural Factors in Structural Support Vector Machines

Abstract:Tackling pattern recognition problems in areas such as computer vision, bioinformatics, speech or text recognition is often done best by taking into account task-specific statistical relations between output variables. In structured prediction, this internal structure is used to predict multiple outputs simultaneously, leading to more accurate and coherent predictions. Structural support vector machines (SSVMs) are nonprobabilistic models that optimize a joint input-output function through margin-based learning. Because SSVMs generally disregard the interplay between unary and interaction factors during the training phase, final parameters are suboptimal. Moreover, its factors are often restricted to linear combinations of input features, limiting its generalization power. To improve prediction accuracy, this paper proposes: (i) Joint inference and learning by integration of back-propagation and loss-augmented inference in SSVM subgradient descent; (ii) Extending SSVM factors to neural networks that form highly nonlinear functions of input features. Image segmentation benchmark results demonstrate improvements over conventional SSVM training methods in terms of accuracy, highlighting the feasibility of end-to-end SSVM training with neural factors.

Via

Access Paper or Ask Questions