Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Neil Rabinowitz

Explainability Via Causal Self-Talk

Nov 17, 2022

Nicholas A. Roy, Junkyung Kim, Neil Rabinowitz

Abstract:Explaining the behavior of AI systems is an important problem that, in practice, is generally avoided. While the XAI community has been developing an abundance of techniques, most incur a set of costs that the wider deep learning community has been unwilling to pay in most situations. We take a pragmatic view of the issue, and define a set of desiderata that capture both the ambitions of XAI and the practical constraints of deep learning. We describe an effective way to satisfy all the desiderata: train the AI system to build a causal model of itself. We develop an instance of this solution for Deep RL agents: Causal Self-Talk. CST operates by training the agent to communicate with itself across time. We implement this method in a simulated 3D environment, and show how it enables agents to generate faithful and semantically-meaningful explanations of their own behavior. Beyond explanations, we also demonstrate that these learned models provide new ways of building semantic control interfaces to AI systems.

Via

Access Paper or Ask Questions

Alchemy: A structured task distribution for meta-reinforcement learning

Feb 04, 2021

Jane X. Wang, Michael King, Nicolas Porcel, Zeb Kurth-Nelson, Tina Zhu, Charlie Deck, Peter Choy, Mary Cassin, Malcolm Reynolds, Francis Song(+7 more)

Figure 1 for Alchemy: A structured task distribution for meta-reinforcement learning

Figure 2 for Alchemy: A structured task distribution for meta-reinforcement learning

Figure 3 for Alchemy: A structured task distribution for meta-reinforcement learning

Figure 4 for Alchemy: A structured task distribution for meta-reinforcement learning

Abstract:There has been rapidly growing interest in meta-learning as a method for increasing the flexibility and sample efficiency of reinforcement learning. One problem in this area of research, however, has been a scarcity of adequate benchmark tasks. In general, the structure underlying past benchmarks has either been too simple to be inherently interesting, or too ill-defined to support principled analysis. In the present work, we introduce a new benchmark for meta-RL research, which combines structural richness with structural transparency. Alchemy is a 3D video game, implemented in Unity, which involves a latent causal structure that is resampled procedurally from episode to episode, affording structure learning, online inference, hypothesis testing and action sequencing based on abstract domain knowledge. We evaluate a pair of powerful RL agents on Alchemy and present an in-depth analysis of one of these agents. Results clearly indicate a frank and specific failure of meta-learning, providing validation for Alchemy as a challenging benchmark for meta-RL. Concurrent with this report, we are releasing Alchemy as public resource, together with a suite of analysis tools and sample agent trajectories.

* 16 pages, 9 figures

Via

Access Paper or Ask Questions

Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

Sep 03, 2019

Tom Le Paine, Caglar Gulcehre, Bobak Shahriari, Misha Denil, Matt Hoffman, Hubert Soyer, Richard Tanburn, Steven Kapturowski, Neil Rabinowitz, Duncan Williams(+4 more)

Figure 1 for Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

Figure 2 for Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

Figure 3 for Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

Figure 4 for Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

Abstract:This paper introduces R2D3, an agent that makes efficient use of demonstrations to solve hard exploration problems in partially observable environments with highly variable initial conditions. We also introduce a suite of eight tasks that combine these three properties, and show that R2D3 can solve several of the tasks where other state of the art methods (both with and without demonstrations) fail to see even a single successful trajectory after tens of billions of steps of exploration.

Via

Access Paper or Ask Questions

Meta-learning of Sequential Strategies

May 08, 2019

Pedro A. Ortega, Jane X. Wang, Mark Rowland, Tim Genewein, Zeb Kurth-Nelson, Razvan Pascanu, Nicolas Heess, Joel Veness, Alex Pritzel, Pablo Sprechmann(+14 more)

Figure 1 for Meta-learning of Sequential Strategies

Figure 2 for Meta-learning of Sequential Strategies

Figure 3 for Meta-learning of Sequential Strategies

Figure 4 for Meta-learning of Sequential Strategies

Abstract:In this report we review memory-based meta-learning as a tool for building sample-efficient strategies that learn from past experience to adapt to any task within a target class. Our goal is to equip the reader with the conceptual foundations of this tool for building new, scalable agents that operate on broad domains. To do so, we present basic algorithmic templates for building near-optimal predictors and reinforcement learners which behave as if they had a probabilistic model that allowed them to efficiently exploit task structure. Furthermore, we recast memory-based meta-learning within a Bayesian framework, showing that the meta-learned strategies are near-optimal because they amortize Bayes-filtered data, where the adaptation is implemented in the memory dynamics as a state-machine of sufficient statistics. Essentially, memory-based meta-learning translates the hard problem of probabilistic sequential inference into a regression problem.

* DeepMind Technical Report (15 pages, 6 figures)

Via

Access Paper or Ask Questions

The Predictron: End-To-End Learning and Planning

Jul 20, 2017

David Silver, Hado van Hasselt, Matteo Hessel, Tom Schaul, Arthur Guez, Tim Harley, Gabriel Dulac-Arnold, David Reichert, Neil Rabinowitz, Andre Barreto(+1 more)

Figure 1 for The Predictron: End-To-End Learning and Planning

Figure 2 for The Predictron: End-To-End Learning and Planning

Figure 3 for The Predictron: End-To-End Learning and Planning

Figure 4 for The Predictron: End-To-End Learning and Planning

Abstract:One of the key challenges of artificial intelligence is to learn models that are effective in the context of planning. In this document we introduce the predictron architecture. The predictron consists of a fully abstract model, represented by a Markov reward process, that can be rolled forward multiple "imagined" planning steps. Each forward pass of the predictron accumulates internal rewards and values over multiple planning depths. The predictron is trained end-to-end so as to make these accumulated values accurately approximate the true value function. We applied the predictron to procedurally generated random mazes and a simulator for the game of pool. The predictron yielded significantly more accurate predictions than conventional deep neural network architectures.

* Camera-ready version, ICML 2017, with supplement

Via

Access Paper or Ask Questions

Overcoming catastrophic forgetting in neural networks

Jan 25, 2017

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska(+4 more)

Figure 1 for Overcoming catastrophic forgetting in neural networks

Figure 2 for Overcoming catastrophic forgetting in neural networks

Figure 3 for Overcoming catastrophic forgetting in neural networks

Figure 4 for Overcoming catastrophic forgetting in neural networks

Abstract:The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence. Neural networks are not, in general, capable of this and it has been widely thought that catastrophic forgetting is an inevitable feature of connectionist models. We show that it is possible to overcome this limitation and train networks that can maintain expertise on tasks which they have not experienced for a long time. Our approach remembers old tasks by selectively slowing down learning on the weights important for those tasks. We demonstrate our approach is scalable and effective by solving a set of classification tasks based on the MNIST hand written digit dataset and by learning several Atari 2600 games sequentially.

Via

Access Paper or Ask Questions