Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Oliver Groth

Improving cosmological reach of a gravitational wave observatory using Deep Loop Shaping

Sep 17, 2025

Jonas Buchli, Brendan Tracey, Tomislav Andric, Christopher Wipf, Yu Him Justin Chiu, Matthias Lochbrunner, Craig Donner, Rana X. Adhikari, Jan Harms, Iain Barr(+21 more)

Abstract:Improved low-frequency sensitivity of gravitational wave observatories would unlock study of intermediate-mass black hole mergers, binary black hole eccentricity, and provide early warnings for multi-messenger observations of binary neutron star mergers. Today's mirror stabilization control injects harmful noise, constituting a major obstacle to sensitivity improvements. We eliminated this noise through Deep Loop Shaping, a reinforcement learning method using frequency domain rewards. We proved our methodology on the LIGO Livingston Observatory (LLO). Our controller reduced control noise in the 10--30Hz band by over 30x, and up to 100x in sub-bands surpassing the design goal motivated by the quantum limit. These results highlight the potential of Deep Loop Shaping to improve current and future GW observatories, and more broadly instrumentation and control systems.

* Science 389, 6764 (2025) 1012-1015

Via

Access Paper or Ask Questions

Offline Actor-Critic Reinforcement Learning Scales to Large Models

Feb 08, 2024

Jost Tobias Springenberg, Abbas Abdolmaleki, Jingwei Zhang, Oliver Groth, Michael Bloesch, Thomas Lampe, Philemon Brakel, Sarah Bechtle, Steven Kapturowski, Roland Hafner(+2 more)

Figure 1 for Offline Actor-Critic Reinforcement Learning Scales to Large Models

Figure 2 for Offline Actor-Critic Reinforcement Learning Scales to Large Models

Figure 3 for Offline Actor-Critic Reinforcement Learning Scales to Large Models

Figure 4 for Offline Actor-Critic Reinforcement Learning Scales to Large Models

Abstract:We show that offline actor-critic reinforcement learning can scale to large models - such as transformers - and follows similar scaling laws as supervised learning. We find that offline actor-critic algorithms can outperform strong, supervised, behavioral cloning baselines for multi-task training on a large dataset containing both sub-optimal and expert behavior on 132 continuous control tasks. We introduce a Perceiver-based actor-critic model and elucidate the key model features needed to make offline RL work with self- and cross-attention modules. Overall, we find that: i) simple offline actor critic algorithms are a natural choice for gradually moving away from the currently predominant paradigm of behavioral cloning, and ii) via offline RL it is possible to learn multi-task policies that master many domains simultaneously, including real robotics tasks, from sub-optimal demonstrations or self-generated data.

Via

Access Paper or Ask Questions

Mastering Stacking of Diverse Shapes with Large-Scale Iterative Reinforcement Learning on Real Robots

Dec 18, 2023

Thomas Lampe, Abbas Abdolmaleki, Sarah Bechtle, Sandy H. Huang, Jost Tobias Springenberg, Michael Bloesch, Oliver Groth, Roland Hafner, Tim Hertweck, Michael Neunert(+5 more)

Figure 1 for Mastering Stacking of Diverse Shapes with Large-Scale Iterative Reinforcement Learning on Real Robots

Figure 2 for Mastering Stacking of Diverse Shapes with Large-Scale Iterative Reinforcement Learning on Real Robots

Figure 3 for Mastering Stacking of Diverse Shapes with Large-Scale Iterative Reinforcement Learning on Real Robots

Figure 4 for Mastering Stacking of Diverse Shapes with Large-Scale Iterative Reinforcement Learning on Real Robots

Abstract:Reinforcement learning solely from an agent's self-generated data is often believed to be infeasible for learning on real robots, due to the amount of data needed. However, if done right, agents learning from real data can be surprisingly efficient through re-using previously collected sub-optimal data. In this paper we demonstrate how the increased understanding of off-policy learning methods and their embedding in an iterative online/offline scheme (``collect and infer'') can drastically improve data-efficiency by using all the collected experience, which empowers learning from real robot experience only. Moreover, the resulting policy improves significantly over the state of the art on a recently proposed real robot manipulation benchmark. Our approach learns end-to-end, directly from pixels, and does not rely on additional human domain knowledge such as a simulator or demonstrations.

Via

Access Paper or Ask Questions

RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation

Jun 20, 2023

Konstantinos Bousmalis, Giulia Vezzani, Dushyant Rao, Coline Devin, Alex X. Lee, Maria Bauza, Todor Davchev, Yuxiang Zhou, Agrim Gupta, Akhil Raju(+29 more)

Figure 1 for RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation

Figure 2 for RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation

Figure 3 for RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation

Figure 4 for RoboCat: A Self-Improving Foundation Agent for Robotic Manipulation

Abstract:The ability to leverage heterogeneous robotic experience from different robots and tasks to quickly master novel skills and embodiments has the potential to transform robot learning. Inspired by recent advances in foundation models for vision and language, we propose a foundation agent for robotic manipulation. This agent, named RoboCat, is a visual goal-conditioned decision transformer capable of consuming multi-embodiment action-labelled visual experience. This data spans a large repertoire of motor control skills from simulated and real robotic arms with varying sets of observations and actions. With RoboCat, we demonstrate the ability to generalise to new tasks and robots, both zero-shot as well as through adaptation using only 100--1000 examples for the target task. We also show how a trained model itself can be used to generate data for subsequent training iterations, thus providing a basic building block for an autonomous improvement loop. We investigate the agent's capabilities, with large-scale evaluations both in simulation and on three different real robot embodiments. We find that as we grow and diversify its training data, RoboCat not only shows signs of cross-task transfer, but also becomes more efficient at adapting to new tasks.

Via

Access Paper or Ask Questions

Unlocking the Power of Representations in Long-term Novelty-based Exploration

May 02, 2023

Alaa Saade, Steven Kapturowski, Daniele Calandriello, Charles Blundell, Pablo Sprechmann, Leopoldo Sarra, Oliver Groth, Michal Valko, Bilal Piot

Figure 1 for Unlocking the Power of Representations in Long-term Novelty-based Exploration

Figure 2 for Unlocking the Power of Representations in Long-term Novelty-based Exploration

Figure 3 for Unlocking the Power of Representations in Long-term Novelty-based Exploration

Figure 4 for Unlocking the Power of Representations in Long-term Novelty-based Exploration

Abstract:We introduce Robust Exploration via Clustering-based Online Density Estimation (RECODE), a non-parametric method for novelty-based exploration that estimates visitation counts for clusters of states based on their similarity in a chosen embedding space. By adapting classical clustering to the nonstationary setting of Deep RL, RECODE can efficiently track state visitation counts over thousands of episodes. We further propose a novel generalization of the inverse dynamics loss, which leverages masked transformer architectures for multi-step prediction; which in conjunction with RECODE achieves a new state-of-the-art in a suite of challenging 3D-exploration tasks in DM-Hard-8. RECODE also sets new state-of-the-art in hard exploration Atari games, and is the first agent to reach the end screen in "Pitfall!".

Via

Access Paper or Ask Questions

Is Curiosity All You Need? On the Utility of Emergent Behaviours from Curious Exploration

Sep 17, 2021

Oliver Groth, Markus Wulfmeier, Giulia Vezzani, Vibhavari Dasagi, Tim Hertweck, Roland Hafner, Nicolas Heess, Martin Riedmiller

Figure 1 for Is Curiosity All You Need? On the Utility of Emergent Behaviours from Curious Exploration

Figure 2 for Is Curiosity All You Need? On the Utility of Emergent Behaviours from Curious Exploration

Figure 3 for Is Curiosity All You Need? On the Utility of Emergent Behaviours from Curious Exploration

Figure 4 for Is Curiosity All You Need? On the Utility of Emergent Behaviours from Curious Exploration

Abstract:Curiosity-based reward schemes can present powerful exploration mechanisms which facilitate the discovery of solutions for complex, sparse or long-horizon tasks. However, as the agent learns to reach previously unexplored spaces and the objective adapts to reward new areas, many behaviours emerge only to disappear due to being overwritten by the constantly shifting objective. We argue that merely using curiosity for fast environment exploration or as a bonus reward for a specific task does not harness the full potential of this technique and misses useful skills. Instead, we propose to shift the focus towards retaining the behaviours which emerge during curiosity-based learning. We posit that these self-discovered behaviours serve as valuable skills in an agent's repertoire to solve related tasks. Our experiments demonstrate the continuous shift in behaviour throughout training and the benefits of a simple policy snapshot method to reuse discovered behaviour for transfer tasks.

* 14 pages, 7 figures, 2 tables

Via

Access Paper or Ask Questions

RELATE: Physically Plausible Multi-Object Scene Synthesis Using Structured Latent Spaces

Jul 02, 2020

Sebastien Ehrhardt, Oliver Groth, Aron Monszpart, Martin Engelcke, Ingmar Posner, Niloy Mitra, Andrea Vedaldi

Figure 1 for RELATE: Physically Plausible Multi-Object Scene Synthesis Using Structured Latent Spaces

Figure 2 for RELATE: Physically Plausible Multi-Object Scene Synthesis Using Structured Latent Spaces

Figure 3 for RELATE: Physically Plausible Multi-Object Scene Synthesis Using Structured Latent Spaces

Figure 4 for RELATE: Physically Plausible Multi-Object Scene Synthesis Using Structured Latent Spaces

Abstract:We present RELATE, a model that learns to generate physically plausible scenes and videos of multiple interacting objects. Similar to other generative approaches, RELATE is trained end-to-end on raw, unlabeled data. RELATE combines an object-centric GAN formulation with a model that explicitly accounts for correlations between individual objects. This allows the model to generate realistic scenes and videos from a physically-interpretable parameterization. Furthermore, we show that modeling the object correlation is necessary to learn to disentangle object positions and identity. We find that RELATE is also amenable to physically realistic scene editing and that it significantly outperforms prior art in object-centric scene generation in both synthetic (CLEVR, ShapeStacks) and real-world data (street traffic scenes). In addition, in contrast to state-of-the-art methods in object-centric generative modeling, RELATE also extends naturally to dynamic scenes and generates videos of high visual fidelity

Via

Access Paper or Ask Questions

Goal-Conditioned End-to-End Visuomotor Control for Versatile Skill Primitives

Mar 19, 2020

Oliver Groth, Chia-Man Hung, Andrea Vedaldi, Ingmar Posner

Figure 1 for Goal-Conditioned End-to-End Visuomotor Control for Versatile Skill Primitives

Figure 2 for Goal-Conditioned End-to-End Visuomotor Control for Versatile Skill Primitives

Figure 3 for Goal-Conditioned End-to-End Visuomotor Control for Versatile Skill Primitives

Figure 4 for Goal-Conditioned End-to-End Visuomotor Control for Versatile Skill Primitives

Abstract:Visuomotor control (VMC) is an effective means of achieving basic manipulation tasks such as pushing or pick-and-place from raw images. Conditioning VMC on desired goal states is a promising way of achieving versatile skill primitives. However, common conditioning schemes either rely on task-specific fine tuning (e.g. using meta-learning) or on sampling approaches using a forward model of scene dynamics i.e. model-predictive control, leaving deployability and planning horizon severely limited. In this paper we propose a conditioning scheme which avoids these pitfalls by learning the controller and its conditioning in an end-to-end manner. Our model predicts complex action sequences based directly on a dynamic image representation of the robot motion and the distance to a given target observation. In contrast to related works, this enables our approach to efficiently perform complex pushing and pick-and-place tasks from raw image observations without predefined control primitives. We report significant improvements in task success over a representative model-predictive controller and also demonstrate our model's generalisation capabilities in challenging, unseen tasks handling unfamiliar objects.

* 15 pages, 10 figures, 9 tables; supplementary video available: https://youtu.be/zn_lPor9zCU

Via

Access Paper or Ask Questions

Imagine That! Leveraging Emergent Affordances for Tool Synthesis in Reaching Tasks

Nov 06, 2019

Yizhe Wu, Sudhanshu Kasewa, Oliver Groth, Sasha Salter, Li Sun, Oiwi Parker Jones, Ingmar Posner

Figure 1 for Imagine That! Leveraging Emergent Affordances for Tool Synthesis in Reaching Tasks

Figure 2 for Imagine That! Leveraging Emergent Affordances for Tool Synthesis in Reaching Tasks

Figure 3 for Imagine That! Leveraging Emergent Affordances for Tool Synthesis in Reaching Tasks

Figure 4 for Imagine That! Leveraging Emergent Affordances for Tool Synthesis in Reaching Tasks

Abstract:In this paper we investigate an artificial agent's ability to perform task-focused tool synthesis via imagination. Our motivation is to explore the richness of information captured by the latent space of an object-centric generative model -- and how to exploit it. In particular, our approach employs activation maximisation of a task-based performance predictor to optimise the latent variable of a structured latent-space model in order to generate tool geometries appropriate for the task at hand. We evaluate our model using a novel dataset of synthetic reaching tasks inspired by the cognitive sciences and behavioural ecology. In doing so we examine the model's ability to imagine tools for increasingly complex scenario types, beyond those seen during training. Our experiments demonstrate that the synthesis process modifies emergent, task-relevant object affordances in a targeted and deliberate way: the agents often specifically modify aspects of the tools which relate to meaningful (yet implicitly learned) concepts such as a tool's length, width and configuration. Our results therefore suggest that task relevant object affordances are implicitly encoded as directions in a structured latent space shaped by experience.

* 17 pages, 6 figures

Via

Access Paper or Ask Questions

ShapeStacks: Learning Vision-Based Physical Intuition for Generalised Object Stacking

Jul 06, 2018

Oliver Groth, Fabian B. Fuchs, Ingmar Posner, Andrea Vedaldi

Figure 1 for ShapeStacks: Learning Vision-Based Physical Intuition for Generalised Object Stacking

Figure 2 for ShapeStacks: Learning Vision-Based Physical Intuition for Generalised Object Stacking

Figure 3 for ShapeStacks: Learning Vision-Based Physical Intuition for Generalised Object Stacking

Figure 4 for ShapeStacks: Learning Vision-Based Physical Intuition for Generalised Object Stacking

Abstract:Physical intuition is pivotal for intelligent agents to perform complex tasks. In this paper we investigate the passive acquisition of an intuitive understanding of physical principles as well as the active utilisation of this intuition in the context of generalised object stacking. To this end, we provide: a simulation-based dataset featuring 20,000 stack configurations composed of a variety of elementary geometric primitives richly annotated regarding semantics and structural stability. We train visual classifiers for binary stability prediction on the ShapeStacks data and scrutinise their learned physical intuition. Due to the richness of the training data our approach also generalises favourably to real-world scenarios achieving state-of-the-art stability prediction on a publicly available benchmark of block towers. We then leverage the physical intuition learned by our model to actively construct stable stacks and observe the emergence of an intuitive notion of stackability - an inherent object affordance - induced by the active stacking task. Our approach performs well even in challenging conditions where it considerably exceeds the stack height observed during training or in cases where initially unstable structures must be stabilised via counterbalancing.

* revised version to appear at ECCV 2018

Via

Access Paper or Ask Questions