Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:SelfBC: Self Behavior Cloning for Offline Reinforcement Learning

Aug 04, 2024

Shirong Liu, Chenjia Bai, Zixian Guo, Hao Zhang, Gaurav Sharma, Yang Liu

Figure 1 for SelfBC: Self Behavior Cloning for Offline Reinforcement Learning

Figure 2 for SelfBC: Self Behavior Cloning for Offline Reinforcement Learning

Figure 3 for SelfBC: Self Behavior Cloning for Offline Reinforcement Learning

Figure 4 for SelfBC: Self Behavior Cloning for Offline Reinforcement Learning

Share this with someone who'll enjoy it:

Abstract:Policy constraint methods in offline reinforcement learning employ additional regularization techniques to constrain the discrepancy between the learned policy and the offline dataset. However, these methods tend to result in overly conservative policies that resemble the behavior policy, thus limiting their performance. We investigate this limitation and attribute it to the static nature of traditional constraints. In this paper, we propose a novel dynamic policy constraint that restricts the learned policy on the samples generated by the exponential moving average of previously learned policies. By integrating this self-constraint mechanism into off-policy methods, our method facilitates the learning of non-conservative policies while avoiding policy collapse in the offline setting. Theoretical results show that our approach results in a nearly monotonically improved reference policy. Extensive experiments on the D4RL MuJoCo domain demonstrate that our proposed method achieves state-of-the-art performance among the policy constraint methods.

View paper on

Share this with someone who'll enjoy it:

Title:SelfBC: Self Behavior Cloning for Offline Reinforcement Learning

Paper and Code