Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Paul-Antoine Le Tolguenec

Exploration by Running Away from the Past

Nov 21, 2024

Paul-Antoine Le Tolguenec, Yann Besse, Florent Teichteil-Koenigsbuch, Dennis G. Wilson, Emmanuel Rachelson

Figure 1 for Exploration by Running Away from the Past

Figure 2 for Exploration by Running Away from the Past

Figure 3 for Exploration by Running Away from the Past

Figure 4 for Exploration by Running Away from the Past

Abstract:The ability to explore efficiently and effectively is a central challenge of reinforcement learning. In this work, we consider exploration through the lens of information theory. Specifically, we cast exploration as a problem of maximizing the Shannon entropy of the state occupation measure. This is done by maximizing a sequence of divergences between distributions representing an agent's past behavior and its current behavior. Intuitively, this encourages the agent to explore new behaviors that are distinct from past behaviors. Hence, we call our method RAMP, for ``$\textbf{R}$unning $\textbf{A}$way fro$\textbf{m}$ the $\textbf{P}$ast.'' A fundamental question of this method is the quantification of the distribution change over time. We consider both the Kullback-Leibler divergence and the Wasserstein distance to quantify divergence between successive state occupation measures, and explain why the former might lead to undesirable exploratory behaviors in some tasks. We demonstrate that by encouraging the agent to explore by actively distancing itself from past experiences, it can effectively explore mazes and a wide range of behaviors on robotic manipulation and locomotion tasks.

Via

Access Paper or Ask Questions

Exploration by Learning Diverse Skills through Successor State Measures

Jun 14, 2024

Paul-Antoine Le Tolguenec, Yann Besse, Florent Teichteil-Konigsbuch, Dennis G. Wilson, Emmanuel Rachelson

Figure 1 for Exploration by Learning Diverse Skills through Successor State Measures

Figure 2 for Exploration by Learning Diverse Skills through Successor State Measures

Figure 3 for Exploration by Learning Diverse Skills through Successor State Measures

Figure 4 for Exploration by Learning Diverse Skills through Successor State Measures

Abstract:The ability to perform different skills can encourage agents to explore. In this work, we aim to construct a set of diverse skills which uniformly cover the state space. We propose a formalization of this search for diverse skills, building on a previous definition based on the mutual information between states and skills. We consider the distribution of states reached by a policy conditioned on each skill and leverage the successor state measure to maximize the difference between these skill distributions. We call this approach LEADS: Learning Diverse Skills through Successor States. We demonstrate our approach on a set of maze navigation and robotic control tasks which show that our method is capable of constructing a diverse set of skills which exhaustively cover the state space without relying on reward or exploration bonuses. Our findings demonstrate that this new formalization promotes more robust and efficient exploration by combining mutual information maximization and exploration bonuses.

Via

Access Paper or Ask Questions

Curiosity creates Diversity in Policy Search

Dec 07, 2022

Paul-Antoine Le Tolguenec, Emmanuel Rachelson, Yann Besse, Dennis G. Wilson

Figure 1 for Curiosity creates Diversity in Policy Search

Figure 2 for Curiosity creates Diversity in Policy Search

Figure 3 for Curiosity creates Diversity in Policy Search

Figure 4 for Curiosity creates Diversity in Policy Search

Abstract:When searching for policies, reward-sparse environments often lack sufficient information about which behaviors to improve upon or avoid. In such environments, the policy search process is bound to blindly search for reward-yielding transitions and no early reward can bias this search in one direction or another. A way to overcome this is to use intrinsic motivation in order to explore new transitions until a reward is found. In this work, we use a recently proposed definition of intrinsic motivation, Curiosity, in an evolutionary policy search method. We propose Curiosity-ES, an evolutionary strategy adapted to use Curiosity as a fitness metric. We compare Curiosity with Novelty, a commonly used diversity metric, and find that Curiosity can generate higher diversity over full episodes without the need for an explicit diversity criterion and lead to multiple policies which find reward.

Via

Access Paper or Ask Questions