Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Wei-Di Chang

Generalizable Imitation Learning Through Pre-Trained Representations

Nov 15, 2023

Wei-Di Chang, Francois Hogan, David Meger, Gregory Dudek

Abstract:In this paper we leverage self-supervised vision transformer models and their emergent semantic abilities to improve the generalization abilities of imitation learning policies. We introduce BC-ViT, an imitation learning algorithm that leverages rich DINO pre-trained Visual Transformer (ViT) patch-level embeddings to obtain better generalization when learning through demonstrations. Our learner sees the world by clustering appearance features into semantic concepts, forming stable keypoints that generalize across a wide range of appearance variations and object types. We show that this representation enables generalized behaviour by evaluating imitation learning across a diverse dataset of object manipulation tasks. Our method, data and evaluation approach are made available to facilitate further study of generalization in Imitation Learners.

Via

Access Paper or Ask Questions

Imitation Learning from Observation through Optimal Transport

Oct 02, 2023

Wei-Di Chang, Scott Fujimoto, David Meger, Gregory Dudek

Figure 1 for Imitation Learning from Observation through Optimal Transport

Figure 2 for Imitation Learning from Observation through Optimal Transport

Figure 3 for Imitation Learning from Observation through Optimal Transport

Figure 4 for Imitation Learning from Observation through Optimal Transport

Abstract:Imitation Learning from Observation (ILfO) is a setting in which a learner tries to imitate the behavior of an expert, using only observational data and without the direct guidance of demonstrated actions. In this paper, we re-examine the use of optimal transport for IL, in which a reward is generated based on the Wasserstein distance between the state trajectories of the learner and expert. We show that existing methods can be simplified to generate a reward function without requiring learned models or adversarial learning. Unlike many other state-of-the-art methods, our approach can be integrated with any RL algorithm, and is amenable to ILfO. We demonstrate the effectiveness of this simple approach on a variety of continuous control tasks and find that it surpasses the state of the art in the IlfO setting, achieving expert-level performance across a range of evaluation domains even when observing only a single expert trajectory without actions.

Via

Access Paper or Ask Questions

For SALE: State-Action Representation Learning for Deep Reinforcement Learning

Jun 04, 2023

Scott Fujimoto, Wei-Di Chang, Edward J. Smith, Shixiang Shane Gu, Doina Precup, David Meger

Abstract:In the field of reinforcement learning (RL), representation learning is a proven tool for complex image-based tasks, but is often overlooked for environments with low-level states, such as physical control problems. This paper introduces SALE, a novel approach for learning embeddings that model the nuanced interaction between state and action, enabling effective representation learning from low-level states. We extensively study the design space of these embeddings and highlight important design considerations. We integrate SALE and an adaptation of checkpoints for RL into TD3 to form the TD7 algorithm, which significantly outperforms existing continuous control algorithms. On OpenAI gym benchmark tasks, TD7 has an average performance gain of 276.7% and 50.7% over TD3 at 300k and 5M time steps, respectively, and works in both the online and offline settings.

Via

Access Paper or Ask Questions

Self-Supervised Transformer Architecture for Change Detection in Radio Access Networks

Feb 03, 2023

Igor Kozlov, Dmitriy Rivkin, Wei-Di Chang, Di Wu, Xue Liu, Gregory Dudek

Abstract:Radio Access Networks (RANs) for telecommunications represent large agglomerations of interconnected hardware consisting of hundreds of thousands of transmitting devices (cells). Such networks undergo frequent and often heterogeneous changes caused by network operators, who are seeking to tune their system parameters for optimal performance. The effects of such changes are challenging to predict and will become even more so with the adoption of 5G/6G networks. Therefore, RAN monitoring is vital for network operators. We propose a self-supervised learning framework that leverages self-attention and self-distillation for this task. It works by detecting changes in Performance Measurement data, a collection of time-varying metrics which reflect a set of diverse measurements of the network performance at the cell level. Experimental results show that our approach outperforms the state of the art by 4% on a real-world based dataset consisting of about hundred thousands timeseries. It also has the merits of being scalable and generalizable. This allows it to provide deep insight into the specifics of mode of operation changes while relying minimally on expert knowledge.

* Accepted by 2023 IEEE International Conference on Communications (ICC) Machine Learning for Communications and Networking Track

Via

Access Paper or Ask Questions

IL-flOw: Imitation Learning from Observation using Normalizing Flows

May 19, 2022

Wei-Di Chang, Juan Camilo Gamboa Higuera, Scott Fujimoto, David Meger, Gregory Dudek

Figure 1 for IL-flOw: Imitation Learning from Observation using Normalizing Flows

Figure 2 for IL-flOw: Imitation Learning from Observation using Normalizing Flows

Figure 3 for IL-flOw: Imitation Learning from Observation using Normalizing Flows

Figure 4 for IL-flOw: Imitation Learning from Observation using Normalizing Flows

Abstract:We present an algorithm for Inverse Reinforcement Learning (IRL) from expert state observations only. Our approach decouples reward modelling from policy learning, unlike state-of-the-art adversarial methods which require updating the reward model during policy search and are known to be unstable and difficult to optimize. Our method, IL-flOw, recovers the expert policy by modelling state-state transitions, by generating rewards using deep density estimators trained on the demonstration trajectories, avoiding the instability issues of adversarial methods. We demonstrate that using the state transition log-probability density as a reward signal for forward reinforcement learning translates to matching the trajectory distribution of the expert demonstrations, and experimentally show good recovery of the true reward signal as well as state of the art results for imitation from observation on locomotion and robotic continuous control tasks.

* Presented at the 4th Robot Learning Workshop at NeurIPS 2021

Via

Access Paper or Ask Questions

One-Shot Informed Robotic Visual Search in the Wild

Mar 22, 2020

Karim Koreitem, Florian Shkurti, Travis Manderson, Wei-Di Chang, Juan Camilo Gamboa Higuera, Gregory Dudek

Figure 1 for One-Shot Informed Robotic Visual Search in the Wild

Figure 2 for One-Shot Informed Robotic Visual Search in the Wild

Figure 3 for One-Shot Informed Robotic Visual Search in the Wild

Figure 4 for One-Shot Informed Robotic Visual Search in the Wild

Abstract:We consider the task of underwater robot navigation for the purpose of collecting scientifically-relevant video data for environmental monitoring. The majority of field robots that currently perform monitoring tasks in unstructured natural environments navigate via path-tracking a pre-specified sequence of waypoints. Although this navigation method is often necessary, it is limiting because the robot does not have a model of what the scientist deems to be relevant visual observations. Thus, the robot can neither visually search for particular types of objects, nor focus its attention on parts of the scene that might be more relevant than the pre-specified waypoints and viewpoints. In this paper we propose a method that enables informed visual navigation via a learned visual similarity operator that guides the robot's visual search towards parts of the scene that look like an exemplar image, which is given by the user as a high-level specification for data collection. We propose and evaluate a weakly-supervised video representation learning method that outperforms ImageNet embeddings for similarity tasks in the underwater domain. We also demonstrate the deployment of this similarity operator during informed visual navigation in collaborative environmental monitoring scenarios, in large-scale field trials, where the robot and a human scientist jointly search for relevant visual content.

Via

Access Paper or Ask Questions

OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning

Nov 24, 2017

Peter Henderson, Wei-Di Chang, Pierre-Luc Bacon, David Meger, Joelle Pineau, Doina Precup

Figure 1 for OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning

Figure 2 for OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning

Figure 3 for OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning

Figure 4 for OptionGAN: Learning Joint Reward-Policy Options using Generative Adversarial Inverse Reinforcement Learning

Abstract:Reinforcement learning has shown promise in learning policies that can solve complex problems. However, manually specifying a good reward function can be difficult, especially for intricate tasks. Inverse reinforcement learning offers a useful paradigm to learn the underlying reward function directly from expert demonstrations. Yet in reality, the corpus of demonstrations may contain trajectories arising from a diverse set of underlying reward functions rather than a single one. Thus, in inverse reinforcement learning, it is useful to consider such a decomposition. The options framework in reinforcement learning is specifically designed to decompose policies in a similar light. We therefore extend the options framework and propose a method to simultaneously recover reward options in addition to policy options. We leverage adversarial methods to learn joint reward-policy options using only observed expert states. We show that this approach works well in both simple and complex continuous control tasks and shows significant performance increases in one-shot transfer learning.

* Accepted to the Thirthy-Second AAAI Conference On Artificial Intelligence (AAAI), 2018

Via

Access Paper or Ask Questions

Underwater Multi-Robot Convoying using Visual Tracking by Detection

Sep 25, 2017

Florian Shkurti, Wei-Di Chang, Peter Henderson, Md Jahidul Islam, Juan Camilo Gamboa Higuera, Jimmy Li, Travis Manderson, Anqi Xu, Gregory Dudek, Junaed Sattar

Figure 1 for Underwater Multi-Robot Convoying using Visual Tracking by Detection

Figure 2 for Underwater Multi-Robot Convoying using Visual Tracking by Detection

Figure 3 for Underwater Multi-Robot Convoying using Visual Tracking by Detection

Figure 4 for Underwater Multi-Robot Convoying using Visual Tracking by Detection

Abstract:We present a robust multi-robot convoying approach that relies on visual detection of the leading agent, thus enabling target following in unstructured 3-D environments. Our method is based on the idea of tracking-by-detection, which interleaves efficient model-based object detection with temporal filtering of image-based bounding box estimation. This approach has the important advantage of mitigating tracking drift (i.e. drifting away from the target object), which is a common symptom of model-free trackers and is detrimental to sustained convoying in practice. To illustrate our solution, we collected extensive footage of an underwater robot in ocean settings, and hand-annotated its location in each frame. Based on this dataset, we present an empirical comparison of multiple tracker variants, including the use of several convolutional neural networks, both with and without recurrent connections, as well as frequency-based model-free trackers. We also demonstrate the practicality of this tracking-by-detection strategy in real-world scenarios by successfully controlling a legged underwater robot in five degrees of freedom to follow another robot's independent motion.

* Accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017

Via

Access Paper or Ask Questions

Benchmark Environments for Multitask Learning in Continuous Domains

Aug 14, 2017

Peter Henderson, Wei-Di Chang, Florian Shkurti, Johanna Hansen, David Meger, Gregory Dudek

Figure 1 for Benchmark Environments for Multitask Learning in Continuous Domains

Figure 2 for Benchmark Environments for Multitask Learning in Continuous Domains

Figure 3 for Benchmark Environments for Multitask Learning in Continuous Domains

Figure 4 for Benchmark Environments for Multitask Learning in Continuous Domains

Abstract:As demand drives systems to generalize to various domains and problems, the study of multitask, transfer and lifelong learning has become an increasingly important pursuit. In discrete domains, performance on the Atari game suite has emerged as the de facto benchmark for assessing multitask learning. However, in continuous domains there is a lack of agreement on standard multitask evaluation environments which makes it difficult to compare different approaches fairly. In this work, we describe a benchmark set of tasks that we have developed in an extendable framework based on OpenAI Gym. We run a simple baseline using Trust Region Policy Optimization and release the framework publicly to be expanded and used for the systematic comparison of multitask, transfer, and lifelong learning in continuous domains.

* Accepted at Lifelong Learning: A Reinforcement Learning Approach Workshop @ ICML, Sydney, Australia, 2017

Via

Access Paper or Ask Questions