Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andreas Doerr

Trajectory-Based Off-Policy Deep Reinforcement Learning

May 14, 2019

Andreas Doerr, Michael Volpp, Marc Toussaint, Sebastian Trimpe, Christian Daniel

Figure 1 for Trajectory-Based Off-Policy Deep Reinforcement Learning

Figure 2 for Trajectory-Based Off-Policy Deep Reinforcement Learning

Figure 3 for Trajectory-Based Off-Policy Deep Reinforcement Learning

Figure 4 for Trajectory-Based Off-Policy Deep Reinforcement Learning

Abstract:Policy gradient methods are powerful reinforcement learning algorithms and have been demonstrated to solve many complex tasks. However, these methods are also data-inefficient, afflicted with high variance gradient estimates, and frequently get stuck in local optima. This work addresses these weaknesses by combining recent improvements in the reuse of off-policy data and exploration in parameter space with deterministic behavioral policies. The resulting objective is amenable to standard neural network optimization strategies like stochastic gradient descent or stochastic gradient Hamiltonian Monte Carlo. Incorporation of previous rollouts via importance sampling greatly improves data-efficiency, whilst stochastic optimization schemes facilitate the escape from local optima. We evaluate the proposed approach on a series of continuous control benchmark tasks. The results show that the proposed algorithm is able to successfully and reliably learn solutions using fewer system interactions than standard policy gradient methods.

* Includes appendix. Accepted for ICML 2019

Via

Access Paper or Ask Questions

Meta-Learning Acquisition Functions for Bayesian Optimization

Apr 09, 2019

Michael Volpp, Lukas Fröhlich, Andreas Doerr, Frank Hutter, Christian Daniel

Figure 1 for Meta-Learning Acquisition Functions for Bayesian Optimization

Figure 2 for Meta-Learning Acquisition Functions for Bayesian Optimization

Figure 3 for Meta-Learning Acquisition Functions for Bayesian Optimization

Figure 4 for Meta-Learning Acquisition Functions for Bayesian Optimization

Abstract:Many practical applications of machine learning require data-efficient black-box function optimization, e.g., to identify hyperparameters or process settings. However, readily available algorithms are typically designed to be universal optimizers and are, thus, often suboptimal for specific tasks. We therefore propose a method to learn optimizers which are automatically adapted to a given class of objective functions, e.g., in the context of sim-to-real applications. Instead of learning optimization from scratch, the proposed approach is firmly based within the famous Bayesian optimization framework. Only the acquisition function (AF) is replaced by a learned neural network and therefore the resulting algorithm is still able to exploit the proven generalization capabilities of Gaussian processes. We present experiments on several simulated as well as on a sim-to-real transfer task. The results show that the learned optimizers (1) consistently perform better than or on-par with known AFs on general function classes and (2) can automatically identify structural properties of a function class using cheap simulations and transfer this knowledge to adapt rapidly to real hardware tasks, thereby significantly outperforming existing problem-agnostic AFs.

Via

Access Paper or Ask Questions

Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds

Oct 29, 2018

David Reeb, Andreas Doerr, Sebastian Gerwinn, Barbara Rakitsch

Figure 1 for Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds

Figure 2 for Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds

Figure 3 for Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds

Figure 4 for Learning Gaussian Processes by Minimizing PAC-Bayesian Generalization Bounds

Abstract:Gaussian Processes (GPs) are a generic modelling tool for supervised learning. While they have been successfully applied on large datasets, their use in safety-critical applications is hindered by the lack of good performance guarantees. To this end, we propose a method to learn GPs and their sparse approximations by directly optimizing a PAC-Bayesian bound on their generalization performance, instead of maximizing the marginal likelihood. Besides its theoretical appeal, we find in our evaluation that our learning method is robust and yields significantly better generalization guarantees than other common GP approaches on several regression benchmark datasets.

* 11 pages main text, 12 pages appendix. Camera-ready version submitted to NIPS 2018

Via

Access Paper or Ask Questions

Probabilistic Recurrent State-Space Models

Feb 10, 2018

Andreas Doerr, Christian Daniel, Martin Schiegg, Duy Nguyen-Tuong, Stefan Schaal, Marc Toussaint, Sebastian Trimpe

Figure 1 for Probabilistic Recurrent State-Space Models

Figure 2 for Probabilistic Recurrent State-Space Models

Figure 3 for Probabilistic Recurrent State-Space Models

Figure 4 for Probabilistic Recurrent State-Space Models

Abstract:State-space models (SSMs) are a highly expressive model class for learning patterns in time series data and for system identification. Deterministic versions of SSMs (e.g. LSTMs) proved extremely successful in modeling complex time series data. Fully probabilistic SSMs, however, are often found hard to train, even for smaller problems. To overcome this limitation, we propose a novel model formulation and a scalable training algorithm based on doubly stochastic variational inference and Gaussian processes. In contrast to existing work, the proposed variational approximation allows one to fully capture the latent state temporal correlations. These correlations are the key to robust training. The effectiveness of the proposed PR-SSM is evaluated on a set of real-world benchmark datasets in comparison to state-of-the-art probabilistic model learning methods. Scalability and robustness are demonstrated on a high dimensional problem.

Via

Access Paper or Ask Questions

Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers

Mar 08, 2017

Andreas Doerr, Duy Nguyen-Tuong, Alonso Marco, Stefan Schaal, Sebastian Trimpe

Figure 1 for Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers

Figure 2 for Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers

Figure 3 for Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers

Figure 4 for Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers

Abstract:PID control architectures are widely used in industrial applications. Despite their low number of open parameters, tuning multiple, coupled PID controllers can become tedious in practice. In this paper, we extend PILCO, a model-based policy search framework, to automatically tune multivariate PID controllers purely based on data observed on an otherwise unknown system. The system's state is extended appropriately to frame the PID policy as a static state feedback policy. This renders PID tuning possible as the solution of a finite horizon optimal control problem without further a priori knowledge. The framework is applied to the task of balancing an inverted pendulum on a seven degree-of-freedom robotic arm, thereby demonstrating its capabilities of fast and data-efficient policy learning, even on complex real world problems.

* Accepted final version to appear in 2017 IEEE International Conference on Robotics and Automation (ICRA)

Via

Access Paper or Ask Questions