Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Konrad Żołna

AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

Aug 07, 2023

Michaël Mathieu, Sherjil Ozair, Srivatsan Srinivasan, Caglar Gulcehre, Shangtong Zhang, Ray Jiang, Tom Le Paine, Richard Powell, Konrad Żołna, Julian Schrittwieser(+14 more)

Figure 1 for AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

Figure 2 for AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

Figure 3 for AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

Figure 4 for AlphaStar Unplugged: Large-Scale Offline Reinforcement Learning

Abstract:StarCraft II is one of the most challenging simulated reinforcement learning environments; it is partially observable, stochastic, multi-agent, and mastering StarCraft II requires strategic planning over long time horizons with real-time low-level execution. It also has an active professional competitive scene. StarCraft II is uniquely suited for advancing offline RL algorithms, both because of its challenging nature and because Blizzard has released a massive dataset of millions of StarCraft II games played by human players. This paper leverages that and establishes a benchmark, called AlphaStar Unplugged, introducing unprecedented challenges for offline reinforcement learning. We define a dataset (a subset of Blizzard's release), tools standardizing an API for machine learning methods, and an evaluation protocol. We also present baseline agents, including behavior cloning, offline variants of actor-critic and MuZero. We improve the state of the art of agents using only offline data, and we achieve 90% win rate against previously published AlphaStar behavior cloning agent.

* 32 pages, 13 figures, previous version published as a NeurIPS 2021 workshop: https://openreview.net/forum?id=Np8Pumfoty

Via

Access Paper or Ask Questions

RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation

Jun 20, 2023

Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X. Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju(+29 more)

Figure 1 for RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation

Figure 2 for RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation

Figure 3 for RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation

Figure 4 for RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation

Abstract:The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a foundation agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned decision transformer capable of consuming multi-embodiment action-labelled visual experience. This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions. With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100--1000 examples for the target task. We also show how a trained model itself can be used to generate data for subsequent training iterations, thus providing a basic building block for an autonomous improvement loop. We investigate the agent's capabilities, with large-scale evaluations both in simulation and on three different real robot embodiments. We find that as we grow and diversify its training data, RoboCat not only shows signs of cross-task transfer, but also becomes more efficient at adapting to new tasks.

Via

Access Paper or Ask Questions

Critic Regularized Regression

Jun 26, 2020

Ziyu Wang, Alexander Novikov, Konrad Żołna, Jost Tobias Springenberg, Scott Reed, Bobak Shahriari, Noah Siegel, Josh Merel, Caglar Gulcehre, Nicolas Heess(+1 more)

Figure 1 for Critic Regularized Regression

Figure 2 for Critic Regularized Regression

Figure 3 for Critic Regularized Regression

Figure 4 for Critic Regularized Regression

Abstract:Offline reinforcement learning (RL), also known as batch RL, offers the prospect of policy optimization from large pre-recorded datasets without online environment interaction. It addresses challenges with regard to the cost of data collection and safety, both of which are particularly pertinent to real-world applications of RL. Unfortunately, most off-policy algorithms perform poorly when learning from a fixed dataset. In this paper, we propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR). We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces -- outperforming several state-of-the-art offline RL algorithms by a significant margin on a wide range of benchmark tasks.

* 23 pages

Via

Access Paper or Ask Questions

A Framework for Data-Driven Robotics

Sep 26, 2019

Serkan Cabi, Sergio Gómez Colmenarejo, Alexander Novikov, Ksenia Konyushkova, Scott Reed, Rae Jeong, Konrad Żołna, Yusuf Aytar, David Budden, Mel Vecerik(+6 more)

Figure 1 for A Framework for Data-Driven Robotics

Figure 2 for A Framework for Data-Driven Robotics

Figure 3 for A Framework for Data-Driven Robotics

Figure 4 for A Framework for Data-Driven Robotics

Abstract:We present a framework for data-driven robotics that makes use of a large dataset of recorded robot experience and scales to several tasks using learned reward functions. We show how to apply this framework to accomplish three different object manipulation tasks on a real robot platform. Given demonstrations of a task together with task-agnostic recorded experience, we use a special form of human annotation as supervision to learn a reward function, which enables us to deal with real-world tasks where the reward signal cannot be acquired directly. Learned rewards are used in combination with a large dataset of experience from different tasks to learn a robot policy offline using batch RL. We show that using our approach it is possible to train agents to perform a variety of challenging manipulation tasks including stacking rigid objects and handling cloth.

Via

Access Paper or Ask Questions

Split Batch Normalization: Improving Semi-Supervised Learning under Domain Shift

Apr 06, 2019

Michał Zając, Konrad Żołna, Stanisław Jastrzębski

Figure 1 for Split Batch Normalization: Improving Semi-Supervised Learning under Domain Shift

Figure 2 for Split Batch Normalization: Improving Semi-Supervised Learning under Domain Shift

Figure 3 for Split Batch Normalization: Improving Semi-Supervised Learning under Domain Shift

Figure 4 for Split Batch Normalization: Improving Semi-Supervised Learning under Domain Shift

Abstract:Recent work has shown that using unlabeled data in semi-supervised learning is not always beneficial and can even hurt generalization, especially when there is a class mismatch between the unlabeled and labeled examples. We investigate this phenomenon for image classification on the CIFAR-10 and the ImageNet datasets, and with many other forms of domain shifts applied (e.g. salt-and-pepper noise). Our main contribution is Split Batch Normalization (Split-BN), a technique to improve SSL when the additional unlabeled data comes from a shifted distribution. We achieve it by using separate batch normalization statistics for unlabeled examples. Due to its simplicity, we recommend it as a standard practice. Finally, we analyse how domain shift affects the SSL training process. In particular, we find that during training the statistics of hidden activations in late layers become markedly different between the unlabeled and the labeled examples.

* Under review for ECML PKDD 2019

Via

Access Paper or Ask Questions

Adversarial Framing for Image and Video Classification

Dec 11, 2018

Michał Zając, Konrad Żołna, Negar Rostamzadeh, Pedro O. Pinheiro

Figure 1 for Adversarial Framing for Image and Video Classification

Figure 2 for Adversarial Framing for Image and Video Classification

Figure 3 for Adversarial Framing for Image and Video Classification

Figure 4 for Adversarial Framing for Image and Video Classification

Abstract:Neural networks are prone to adversarial attacks. In general, such attacks deteriorate the quality of the input by either slightly modifying most of its pixels, or by occluding it with a patch. In this paper, we propose a method that keeps the image unchanged and only adds an adversarial framing on the border of the image. We show empirically that our method is able to successfully attack state-of-the-art methods on both image and video classification problems. Notably, the proposed method results in a universal attack which is very fast at test time. Source code can be found at https://github.com/zajaczajac/adv_framing .

* This is an extended version of the paper that will be published at AAAI 2019

Via

Access Paper or Ask Questions