Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marc Höftmann

Backward Learning for Goal-Conditioned Policies

Dec 08, 2023

Marc Höftmann, Jan Robine, Stefan Harmeling

Abstract:Can we learn policies in reinforcement learning without rewards? Can we learn a policy just by trying to reach a goal state? We answer these questions positively by proposing a multi-step procedure that first learns a world model that goes backward in time, secondly generates goal-reaching backward trajectories, thirdly improves those sequences using shortest path finding algorithms, and finally trains a neural network policy by imitation learning. We evaluate our method on a deterministic maze environment where the observations are $64\times 64$ pixel bird's eye images and can show that it consistently reaches several goals.

* World Models, Goal-conditioned, Reward-free, Workshop on Goal-Conditioned Reinforcement Learning - NeurIPS 2023

Via

Access Paper or Ask Questions

A Survey on Self-Supervised Representation Learning

Aug 22, 2023

Tobias Uelwer, Jan Robine, Stefan Sylvius Wagner, Marc Höftmann, Eric Upschulte, Sebastian Konietzny, Maike Behrendt, Stefan Harmeling

Figure 1 for A Survey on Self-Supervised Representation Learning

Figure 2 for A Survey on Self-Supervised Representation Learning

Figure 3 for A Survey on Self-Supervised Representation Learning

Figure 4 for A Survey on Self-Supervised Representation Learning

Abstract:Learning meaningful representations is at the heart of many tasks in the field of modern machine learning. Recently, a lot of methods were introduced that allow learning of image representations without supervision. These representations can then be used in downstream tasks like classification or object detection. The quality of these representations is close to supervised learning, while no labeled images are needed. This survey paper provides a comprehensive review of these methods in a unified notation, points out similarities and differences of these methods, and proposes a taxonomy which sets these methods in relation to each other. Furthermore, our survey summarizes the most-recent experimental results reported in the literature in form of a meta-study. Our survey is intended as a starting point for researchers and practitioners who want to dive into the field of representation learning.

Via

Access Paper or Ask Questions

Transformer-based World Models Are Happy With 100k Interactions

Mar 13, 2023

Jan Robine, Marc Höftmann, Tobias Uelwer, Stefan Harmeling

Figure 1 for Transformer-based World Models Are Happy With 100k Interactions

Figure 2 for Transformer-based World Models Are Happy With 100k Interactions

Figure 3 for Transformer-based World Models Are Happy With 100k Interactions

Figure 4 for Transformer-based World Models Are Happy With 100k Interactions

Abstract:Deep neural networks have been successful in many reinforcement learning settings. However, compared to human learners they are overly data hungry. To build a sample-efficient world model, we apply a transformer to real-world episodes in an autoregressive manner: not only the compact latent states and the taken actions but also the experienced or predicted rewards are fed into the transformer, so that it can attend flexibly to all three modalities at different time steps. The transformer allows our world model to access previous states directly, instead of viewing them through a compressed recurrent state. By utilizing the Transformer-XL architecture, it is able to learn long-term dependencies while staying computationally efficient. Our transformer-based world model (TWM) generates meaningful, new experience, which is used to train a policy that outperforms previous model-free and model-based reinforcement learning algorithms on the Atari 100k benchmark.

* Published as a conference paper at ICLR 2023. Code is available at https://github.com/jrobine/twm

Via

Access Paper or Ask Questions

Time-Myopic Go-Explore: Learning A State Representation for the Go-Explore Paradigm

Jan 13, 2023

Marc Höftmann, Jan Robine, Stefan Harmeling

Figure 1 for Time-Myopic Go-Explore: Learning A State Representation for the Go-Explore Paradigm

Figure 2 for Time-Myopic Go-Explore: Learning A State Representation for the Go-Explore Paradigm

Figure 3 for Time-Myopic Go-Explore: Learning A State Representation for the Go-Explore Paradigm

Figure 4 for Time-Myopic Go-Explore: Learning A State Representation for the Go-Explore Paradigm

Abstract:Very large state spaces with a sparse reward signal are difficult to explore. The lack of a sophisticated guidance results in a poor performance for numerous reinforcement learning algorithms. In these cases, the commonly used random exploration is often not helpful. The literature shows that this kind of environments require enormous efforts to systematically explore large chunks of the state space. Learned state representations can help here to improve the search by providing semantic context and build a structure on top of the raw observations. In this work we introduce a novel time-myopic state representation that clusters temporal close states together while providing a time prediction capability between them. By adapting this model to the Go-Explore paradigm (Ecoffet et al., 2021b), we demonstrate the first learned state representation that reliably estimates novelty instead of using the hand-crafted representation heuristic. Our method shows an improved solution for the detachment problem which still remains an issue at the Go-Explore Exploration Phase. We provide evidence that our proposed method covers the entire state space with respect to all possible time trajectories without causing disadvantageous conflict-overlaps in the cell archive. Analogous to native Go-Explore, our approach is evaluated on the hard exploration environments MontezumaRevenge, Gravitar and Frostbite (Atari) in order to validate its capabilities on difficult tasks. Our experiments show that time-myopic Go-Explore is an effective alternative for the domain-engineered heuristic while also being more general. The source code of the method is available on GitHub.

* 9 pages, 7 figures, Deep Reinforcement Learning Workshop NeurIPS 2022, Deep RL Workshop 2022 NeurIPS, OpenReview

Via

Access Paper or Ask Questions