Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sami Nur Islam

Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning

Jan 29, 2025

Haque Ishfaq, Guangyuan Wang, Sami Nur Islam, Doina Precup

Figure 1 for Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning

Figure 2 for Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning

Figure 3 for Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning

Figure 4 for Langevin Soft Actor-Critic: Efficient Exploration through Uncertainty-Driven Critic Learning

Abstract:Existing actor-critic algorithms, which are popular for continuous control reinforcement learning (RL) tasks, suffer from poor sample efficiency due to lack of principled exploration mechanism within them. Motivated by the success of Thompson sampling for efficient exploration in RL, we propose a novel model-free RL algorithm, Langevin Soft Actor Critic (LSAC), which prioritizes enhancing critic learning through uncertainty estimation over policy optimization. LSAC employs three key innovations: approximate Thompson sampling through distributional Langevin Monte Carlo (LMC) based $Q$ updates, parallel tempering for exploring multiple modes of the posterior of the $Q$ function, and diffusion synthesized state-action samples regularized with $Q$ action gradients. Our extensive experiments demonstrate that LSAC outperforms or matches the performance of mainstream model-free RL algorithms for continuous control tasks. Notably, LSAC marks the first successful application of an LMC based Thompson sampling in continuous control tasks with continuous action spaces.

* Published in The Thirteenth International Conference on Learning Representations (ICLR) 2025. The first two authors contributed equally

Via

Access Paper or Ask Questions

Code as Reward: Empowering Reinforcement Learning with VLMs

Feb 07, 2024

David Venuto, Sami Nur Islam, Martin Klissarov, Doina Precup, Sherry Yang, Ankit Anand

Figure 1 for Code as Reward: Empowering Reinforcement Learning with VLMs

Figure 2 for Code as Reward: Empowering Reinforcement Learning with VLMs

Figure 3 for Code as Reward: Empowering Reinforcement Learning with VLMs

Figure 4 for Code as Reward: Empowering Reinforcement Learning with VLMs

Abstract:Pre-trained Vision-Language Models (VLMs) are able to understand visual concepts, describe and decompose complex tasks into sub-tasks, and provide feedback on task completion. In this paper, we aim to leverage these capabilities to support the training of reinforcement learning (RL) agents. In principle, VLMs are well suited for this purpose, as they can naturally analyze image-based observations and provide feedback (reward) on learning progress. However, inference in VLMs is computationally expensive, so querying them frequently to compute rewards would significantly slowdown the training of an RL agent. To address this challenge, we propose a framework named Code as Reward (VLM-CaR). VLM-CaR produces dense reward functions from VLMs through code generation, thereby significantly reducing the computational burden of querying the VLM directly. We show that the dense rewards generated through our approach are very accurate across a diverse set of discrete and continuous environments, and can be more effective in training RL policies than the original sparse environment rewards.

Via

Access Paper or Ask Questions