Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Anoop Korattikara

UC Irvine

Measuring the Reliability of Reinforcement Learning Algorithms

Dec 10, 2019

Stephanie C. Y. Chan, Sam Fishman, John Canny, Anoop Korattikara, Sergio Guadarrama

Figure 1 for Measuring the Reliability of Reinforcement Learning Algorithms

Figure 2 for Measuring the Reliability of Reinforcement Learning Algorithms

Figure 3 for Measuring the Reliability of Reinforcement Learning Algorithms

Figure 4 for Measuring the Reliability of Reinforcement Learning Algorithms

Abstract:Lack of reliability is a well-known issue for reinforcement learning (RL) algorithms. This problem has gained increasing attention in recent years, and efforts to improve it have grown substantially. To aid RL researchers and production users with the evaluation and improvement of reliability, we propose a set of metrics that quantitatively measure different aspects of reliability. In this work, we focus on variability and risk, both during training and after learning (on a fixed policy). We designed these metrics to be general-purpose, and we also designed complementary statistical tests to enable rigorous comparisons on these metrics. In this paper, we first describe the desired properties of the metrics and their design, the aspects of reliability that they measure, and their applicability to different scenarios. We then describe the statistical tests and make additional practical recommendations for reporting results. The metrics and accompanying statistical tools have been made available as an open-source library, here: https://github.com/google-research/rl-reliability-metrics . We apply our metrics to a set of common RL algorithms and environments, compare them, and analyze the results.

* Accepted at the Workshop on Deep Reinforcement Learning at the 33rd Conference on Neural Information Processing Systems (NeurIPS 2019), Vancouver, Canada

Via

Access Paper or Ask Questions

From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following

Feb 20, 2019

Justin Fu, Anoop Korattikara, Sergey Levine, Sergio Guadarrama

Figure 1 for From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following

Figure 2 for From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following

Figure 3 for From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following

Figure 4 for From Language to Goals: Inverse Reinforcement Learning for Vision-Based Instruction Following

Abstract:Reinforcement learning is a promising framework for solving control problems, but its use in practical situations is hampered by the fact that reward functions are often difficult to engineer. Specifying goals and tasks for autonomous machines, such as robots, is a significant challenge: conventionally, reward functions and goal states have been used to communicate objectives. But people can communicate objectives to each other simply by describing or demonstrating them. How can we build learning algorithms that will allow us to tell machines what we want them to do? In this work, we investigate the problem of grounding language commands as reward functions using inverse reinforcement learning, and argue that language-conditioned rewards are more transferable than language-conditioned policies to new environments. We propose language-conditioned reward learning (LC-RL), which grounds language commands as a reward function represented by a deep neural network. We demonstrate that our model learns rewards that transfer to novel tasks and environments on realistic, high-dimensional visual environments with natural language commands, whereas directly learning a language-conditioned policy leads to poor performance.

Via

Access Paper or Ask Questions

Speed/accuracy trade-offs for modern convolutional object detectors

Apr 25, 2017

Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama(+1 more)

Figure 1 for Speed/accuracy trade-offs for modern convolutional object detectors

Figure 2 for Speed/accuracy trade-offs for modern convolutional object detectors

Figure 3 for Speed/accuracy trade-offs for modern convolutional object detectors

Figure 4 for Speed/accuracy trade-offs for modern convolutional object detectors

Abstract:The goal of this paper is to serve as a guide for selecting a detection architecture that achieves the right speed/memory/accuracy balance for a given application and platform. To this end, we investigate various ways to trade accuracy for speed and memory usage in modern convolutional object detection systems. A number of successful systems have been proposed in recent years, but apples-to-apples comparisons are difficult due to different base feature extractors (e.g., VGG, Residual Networks), different default image resolutions, as well as different hardware and software platforms. We present a unified implementation of the Faster R-CNN [Ren et al., 2015], R-FCN [Dai et al., 2016] and SSD [Liu et al., 2015] systems, which we view as "meta-architectures" and trace out the speed/accuracy trade-off curve created by using alternative feature extractors and varying other critical parameters such as image size within each of these meta-architectures. On one extreme end of this spectrum where speed and memory are critical, we present a detector that achieves real time speeds and can be deployed on a mobile device. On the opposite end in which accuracy is critical, we present a detector that achieves state-of-the-art performance measured on the COCO detection task.

* Accepted to CVPR 2017

Via

Access Paper or Ask Questions

Bayesian Dark Knowledge

Nov 06, 2015

Anoop Korattikara, Vivek Rathod, Kevin Murphy, Max Welling

Abstract:We consider the problem of Bayesian parameter estimation for deep neural networks, which is important in problem settings where we may have little data, and/ or where we need accurate posterior predictive densities, e.g., for applications involving bandits or active learning. One simple approach to this is to use online Monte Carlo methods, such as SGLD (stochastic gradient Langevin dynamics). Unfortunately, such a method needs to store many copies of the parameters (which wastes memory), and needs to make predictions using many versions of the model (which wastes time). We describe a method for "distilling" a Monte Carlo approximation to the posterior predictive density into a more compact form, namely a single deep neural network. We compare to two very recent approaches to Bayesian neural networks, namely an approach based on expectation propagation [Hernandez-Lobato and Adams, 2015] and an approach based on variational Bayes [Blundell et al., 2015]. Our method performs better than both of these, is much simpler to implement, and uses less computation at test time.

* final version submitted to NIPS 2015

Via

Access Paper or Ask Questions

Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC

Mar 10, 2015

Sungjin Ahn, Anoop Korattikara, Nathan Liu, Suju Rajan, Max Welling

Figure 1 for Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC

Figure 2 for Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC

Figure 3 for Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC

Figure 4 for Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC

Abstract:Despite having various attractive qualities such as high prediction accuracy and the ability to quantify uncertainty and avoid over-fitting, Bayesian Matrix Factorization has not been widely adopted because of the prohibitive cost of inference. In this paper, we propose a scalable distributed Bayesian matrix factorization algorithm using stochastic gradient MCMC. Our algorithm, based on Distributed Stochastic Gradient Langevin Dynamics, can not only match the prediction accuracy of standard MCMC methods like Gibbs sampling, but at the same time is as fast and simple as stochastic gradient descent. In our experiments, we show that our algorithm can achieve the same level of prediction accuracy as Gibbs sampling an order of magnitude faster. We also show that our method reduces the prediction error as fast as distributed stochastic gradient descent, achieving a 4.1% improvement in RMSE for the Netflix dataset and an 1.8% for the Yahoo music dataset.

Via

Access Paper or Ask Questions

Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget

Feb 14, 2014

Anoop Korattikara, Yutian Chen, Max Welling

Figure 1 for Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget

Figure 2 for Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget

Figure 3 for Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget

Figure 4 for Austerity in MCMC Land: Cutting the Metropolis-Hastings Budget

Abstract:Can we make Bayesian posterior MCMC sampling more efficient when faced with very large datasets? We argue that computing the likelihood for N datapoints in the Metropolis-Hastings (MH) test to reach a single binary decision is computationally inefficient. We introduce an approximate MH rule based on a sequential hypothesis test that allows us to accept or reject samples with high confidence using only a fraction of the data required for the exact MH rule. While this method introduces an asymptotic bias, we show that this bias can be controlled and is more than offset by a decrease in variance due to our ability to draw more samples per unit of time.

* v4 - version accepted by ICML2014

Via

Access Paper or Ask Questions

Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring

Jun 27, 2012

Sungjin Ahn, Anoop Korattikara, Max Welling

Figure 1 for Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring

Figure 2 for Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring

Figure 3 for Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring

Figure 4 for Bayesian Posterior Sampling via Stochastic Gradient Fisher Scoring

Abstract:In this paper we address the following question: Can we approximately sample from a Bayesian posterior distribution if we are only allowed to touch a small mini-batch of data-items for every sample we generate?. An algorithm based on the Langevin equation with stochastic gradients (SGLD) was previously proposed to solve this, but its mixing rate was slow. By leveraging the Bayesian Central Limit Theorem, we extend the SGLD algorithm so that at high mixing rates it will sample from a normal approximation of the posterior, while for slow mixing rates it will mimic the behavior of SGLD with a pre-conditioner matrix. As a bonus, the proposed algorithm is reminiscent of Fisher scoring (with stochastic gradients) and as such an efficient optimizer during burn-in.

* Appears in Proceedings of the 29th International Conference on Machine Learning (ICML 2012)

Via

Access Paper or Ask Questions