Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:ConserWeightive Behavioral Cloning for Reliable Offline Reinforcement Learning

Oct 11, 2022

Tung Nguyen, Qinqing Zheng, Aditya Grover

Figure 1 for ConserWeightive Behavioral Cloning for Reliable Offline Reinforcement Learning

Figure 2 for ConserWeightive Behavioral Cloning for Reliable Offline Reinforcement Learning

Figure 3 for ConserWeightive Behavioral Cloning for Reliable Offline Reinforcement Learning

Figure 4 for ConserWeightive Behavioral Cloning for Reliable Offline Reinforcement Learning

Share this with someone who'll enjoy it:

Abstract:The goal of offline reinforcement learning (RL) is to learn near-optimal policies from static logged datasets, thus sidestepping expensive online interactions. Behavioral cloning (BC) provides a straightforward solution to offline RL by mimicking offline trajectories via supervised learning. Recent advances (Chen et al., 2021; Janner et al., 2021; Emmons et al., 2021) have shown that by conditioning on desired future returns, BC can perform competitively to their value-based counterparts, while enjoying much more simplicity and training stability. However, the distribution of returns in the offline dataset can be arbitrarily skewed and suboptimal, which poses a unique challenge for conditioning BC on expert returns at test time. We propose ConserWeightive Behavioral Cloning (CWBC), a simple and effective method for improving the performance of conditional BC for offline RL with two key components: trajectory weighting and conservative regularization. Trajectory weighting addresses the bias-variance tradeoff in conditional BC and provides a principled mechanism to learn from both low return trajectories (typically plentiful) and high return trajectories (typically few). Further, we analyze the notion of conservatism in existing BC methods, and propose a novel conservative regularize that explicitly encourages the policy to stay close to the data distribution. The regularizer helps achieve more reliable performance, and removes the need for ad-hoc tuning of the conditioning value during evaluation. We instantiate CWBC in the context of Reinforcement Learning via Supervised Learning (RvS) (Emmons et al., 2021) and Decision Transformer (DT) (Chen et al., 2021), and empirically show that it significantly boosts the performance and stability of prior methods on various offline RL benchmarks. Code is available at https://github.com/tung-nd/cwbc.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:ConserWeightive Behavioral Cloning for Reliable Offline Reinforcement Learning

Paper and Code