Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alexander von Rohr

Viability of Future Actions: Robust Safety in Reinforcement Learning via Entropy Regularization

Jun 12, 2025

Pierre-François Massiani, Alexander von Rohr, Lukas Haverbeck, Sebastian Trimpe

Abstract:Despite the many recent advances in reinforcement learning (RL), the question of learning policies that robustly satisfy state constraints under unknown disturbances remains open. In this paper, we offer a new perspective on achieving robust safety by analyzing the interplay between two well-established techniques in model-free RL: entropy regularization, and constraints penalization. We reveal empirically that entropy regularization in constrained RL inherently biases learning toward maximizing the number of future viable actions, thereby promoting constraints satisfaction robust to action noise. Furthermore, we show that by relaxing strict safety constraints through penalties, the constrained RL problem can be approximated arbitrarily closely by an unconstrained one and thus solved using standard model-free RL. This reformulation preserves both safety and optimality while empirically improving resilience to disturbances. Our results indicate that the connection between entropy regularization and robustness is a promising avenue for further empirical and theoretical investigation, as it enables robust safety in RL through simple reward shaping.

* 24 pages, 11 figures, 2 tables. Accepted for publication at ECML-PKDD 2025

Via

Access Paper or Ask Questions

Diffusion Predictive Control with Constraints

Dec 12, 2024

Ralf Römer, Alexander von Rohr, Angela P. Schoellig

Abstract:Diffusion models have recently gained popularity for policy learning in robotics due to their ability to capture high-dimensional and multimodal distributions. However, diffusion policies are inherently stochastic and typically trained offline, limiting their ability to handle unseen and dynamic conditions where novel constraints not represented in the training data must be satisfied. To overcome this limitation, we propose diffusion predictive control with constraints (DPCC), an algorithm for diffusion-based control with explicit state and action constraints that can deviate from those in the training data. DPCC uses constraint tightening and incorporates model-based projections into the denoising process of a trained trajectory diffusion model. This allows us to generate constraint-satisfying, dynamically feasible, and goal-reaching trajectories for predictive control. We show through simulations of a robot manipulator that DPCC outperforms existing methods in satisfying novel test-time constraints while maintaining performance on the learned control task.

* Code: https://github.com/ralfroemer99/dpcc. 14 pages, 3 figures, 3 tables

Via

Access Paper or Ask Questions

Local Bayesian Optimization for Controller Tuning with Crash Constraints

Nov 25, 2024

Alexander von Rohr, David Stenger, Dominik Scheurenberg, Sebastian Trimpe

Abstract:Controller tuning is crucial for closed-loop performance but often involves manual adjustments. Although Bayesian optimization (BO) has been established as a data-efficient method for automated tuning, applying it to large and high-dimensional search spaces remains challenging. We extend a recently proposed local variant of BO to include crash constraints, where the controller can only be successfully evaluated in an a-priori unknown feasible region. We demonstrate the efficiency of the proposed method through simulations and hardware experiments. Our findings showcase the potential of local BO to enhance controller performance and reduce the time and resources necessary for tuning.

* von Rohr, Alexander, Stenger, David, Scheurenberg, Dominik and Trimpe, Sebastian. "Local Bayesian optimization for controller tuning with crash constraints" at - Automatisierungstechnik, vol. 72, no. 4, 2024, pp. 281-292
* Published in at-Automatisierungstechnik

Via

Access Paper or Ask Questions

Simulation-Aided Policy Tuning for Black-Box Robot Learning

Nov 21, 2024

Shiming He, Alexander von Rohr, Dominik Baumann, Ji Xiang, Sebastian Trimpe

Figure 1 for Simulation-Aided Policy Tuning for Black-Box Robot Learning

Figure 2 for Simulation-Aided Policy Tuning for Black-Box Robot Learning

Figure 3 for Simulation-Aided Policy Tuning for Black-Box Robot Learning

Figure 4 for Simulation-Aided Policy Tuning for Black-Box Robot Learning

Abstract:How can robots learn and adapt to new tasks and situations with little data? Systematic exploration and simulation are crucial tools for efficient robot learning. We present a novel black-box policy search algorithm focused on data-efficient policy improvements. The algorithm learns directly on the robot and treats simulation as an additional information source to speed up the learning process. At the core of the algorithm, a probabilistic model learns the dependence of the policy parameters and the robot learning objective not only by performing experiments on the robot, but also by leveraging data from a simulator. This substantially reduces interaction time with the robot. Using this model, we can guarantee improvements with high probability for each policy update, thereby facilitating fast, goal-oriented learning. We evaluate our algorithm on simulated fine-tuning tasks and demonstrate the data-efficiency of the proposed dual-information source optimization algorithm. In a real robot learning experiment, we show fast and successful task learning on a robot manipulator with the aid of an imperfect simulator.

Via

Access Paper or Ask Questions

Latent Action Priors From a Single Gait Cycle Demonstration for Online Imitation Learning

Oct 04, 2024

Oliver Hausdörfer, Alexander von Rohr, Éric Lefort, Angela Schoellig

Figure 1 for Latent Action Priors From a Single Gait Cycle Demonstration for Online Imitation Learning

Figure 2 for Latent Action Priors From a Single Gait Cycle Demonstration for Online Imitation Learning

Figure 3 for Latent Action Priors From a Single Gait Cycle Demonstration for Online Imitation Learning

Figure 4 for Latent Action Priors From a Single Gait Cycle Demonstration for Online Imitation Learning

Abstract:Deep Reinforcement Learning (DRL) in simulation often results in brittle and unrealistic learning outcomes. To push the agent towards more desirable solutions, prior information can be injected in the learning process through, for instance, reward shaping, expert data, or motion primitives. We propose an additional inductive bias for robot learning: latent actions learned from expert demonstration as priors in the action space. We show that these action priors can be learned from only a single open-loop gait cycle using a simple autoencoder. Using these latent action priors combined with established style rewards for imitation in DRL achieves above expert demonstration level of performance and leads to more desirable gaits. Further, action priors substantially improve the performance on transfer tasks, even leading to gait transitions for higher target speeds. Videos and code are available at https://sites.google.com/view/latent-action-priors.

* Submitted to ICRA 2025

Via

Access Paper or Ask Questions

Experience Transfer for Robust Direct Data-Driven Control

Jun 29, 2023

Alexander von Rohr, Dmitrii Likhachev, Sebastian Trimpe

Abstract:Learning-based control uses data to design efficient controllers for specific systems. When multiple systems are involved, experience transfer usually focuses on data availability and controller performance yet neglects robustness to variations between systems. In contrast, this letter explores experience transfer from a robustness perspective. We leverage the transfer to design controllers that are robust not only to the uncertainty regarding an individual agent's model but also to the choice of agent in a fleet. Experience transfer enables the design of safe and robust controllers that work out of the box for all systems in a heterogeneous fleet. Our approach combines scenario optimization and recent formulations for direct data-driven control without the need to estimate a model of the system or determine uncertainty bounds for its parameters. We demonstrate the benefits of our data-driven robustification method through a numerical case study and obtain learned controllers that generalize well from a small number of open-loop trajectories in a quadcopter simulation.

Via

Access Paper or Ask Questions

Multi-Arm Bin-Picking in Real-Time: A Combined Task and Motion Planning Approach

Nov 20, 2022

Ilyes Toumi, Andreas Orthey, Alexander von Rohr, Ngo Anh Vien

Figure 1 for Multi-Arm Bin-Picking in Real-Time: A Combined Task and Motion Planning Approach

Figure 2 for Multi-Arm Bin-Picking in Real-Time: A Combined Task and Motion Planning Approach

Figure 3 for Multi-Arm Bin-Picking in Real-Time: A Combined Task and Motion Planning Approach

Figure 4 for Multi-Arm Bin-Picking in Real-Time: A Combined Task and Motion Planning Approach

Abstract:Automated bin-picking is a prerequisite for fully automated manufacturing and warehouses. To successfully pick an item from an unstructured bin the robot needs to first detect possible grasps for the objects, decide on the object to remove and consequently plan and execute a feasible trajectory to retrieve the chosen object. Over the last years significant progress has been made towards solving these problems. However, when multiple robot arms are cooperating the decision and planning problems become exponentially harder. We propose an integrated multi-arm bin-picking pipeline (IMAPIP), and demonstrate that it is able to reliably pick objects from a bin in real-time using multiple robot arms. IMAPIP solves the multi-arm bin-picking task first at high-level using a geometry-aware policy integrated in a combined task and motion planning framework. We then plan motions consistent with this policy using the BIT* algorithm on the motion planning level. We show that this integrated solution enables robot arm cooperation. In our experiments, we show the proposed geometry-aware policy outperforms a baseline by increasing bin-picking time by 28\% using two robot arms. The policy is robust to changes in the position of the bin and number of objects. We also show that IMAPIP to successfully scale up to four robot arms working in close proximity.

* 8 pages

Via

Access Paper or Ask Questions

Event-Triggered Time-Varying Bayesian Optimization

Aug 23, 2022

Paul Brunzema, Alexander von Rohr, Friedrich Solowjow, Sebastian Trimpe

Figure 1 for Event-Triggered Time-Varying Bayesian Optimization

Figure 2 for Event-Triggered Time-Varying Bayesian Optimization

Figure 3 for Event-Triggered Time-Varying Bayesian Optimization

Figure 4 for Event-Triggered Time-Varying Bayesian Optimization

Abstract:We consider the problem of sequentially optimizing a time-varying objective function using time-varying Bayesian optimization (TVBO). Here, the key challenge is to cope with old data. Current approaches to TVBO require prior knowledge of a constant rate of change. However, the rate of change is usually neither known nor constant. We propose an event-triggered algorithm, ET-GP-UCB, that detects changes in the objective function online. The event-trigger is based on probabilistic uniform error bounds used in Gaussian process regression. The trigger automatically detects when significant change in the objective functions occurs. The algorithm then adapts to the temporal change by resetting the accumulated dataset. We provide regret bounds for ET-GP-UCB and show in numerical experiments that it is competitive with state-of-the-art algorithms even though it requires no knowledge about the temporal changes. Further, ET-GP-UCB outperforms these competitive baselines if the rate of change is misspecified and we demonstrate that it is readily applicable to various settings without tuning hyperparameters.

Via

Access Paper or Ask Questions

Improving the Performance of Robust Control through Event-Triggered Learning

Jul 28, 2022

Alexander von Rohr, Friedrich Solowjow, Sebastian Trimpe

Figure 1 for Improving the Performance of Robust Control through Event-Triggered Learning

Figure 2 for Improving the Performance of Robust Control through Event-Triggered Learning

Figure 3 for Improving the Performance of Robust Control through Event-Triggered Learning

Figure 4 for Improving the Performance of Robust Control through Event-Triggered Learning

Abstract:Robust controllers ensure stability in feedback loops designed under uncertainty but at the cost of performance. Model uncertainty in time-invariant systems can be reduced by recently proposed learning-based methods, thus improving the performance of robust controllers using data. However, in practice, many systems also exhibit uncertainty in the form of changes over time, e.g., due to weight shifts or wear and tear, leading to decreased performance or instability of the learning-based controller. We propose an event-triggered learning algorithm that decides when to learn in the face of uncertainty in the LQR problem with rare or slow changes. Our key idea is to switch between robust and learned controllers. For learning, we first approximate the optimal length of the learning phase via Monte-Carlo estimations using a probabilistic model. We then design a statistical test for uncertain systems based on the moment-generating function of the LQR cost. The test detects changes in the system under control and triggers re-learning when control performance deteriorates due to system changes. We demonstrate improved performance over a robust controller baseline in a numerical example.

* To appear in the proceedings of the 61st IEEE Conference on Decision and Control

Via

Access Paper or Ask Questions

On Controller Tuning with Time-Varying Bayesian Optimization

Jul 22, 2022

Paul Brunzema, Alexander von Rohr, Sebastian Trimpe

Figure 1 for On Controller Tuning with Time-Varying Bayesian Optimization

Figure 2 for On Controller Tuning with Time-Varying Bayesian Optimization

Figure 3 for On Controller Tuning with Time-Varying Bayesian Optimization

Figure 4 for On Controller Tuning with Time-Varying Bayesian Optimization

Abstract:Changing conditions or environments can cause system dynamics to vary over time. To ensure optimal control performance, controllers should adapt to these changes. When the underlying cause and time of change is unknown, we need to rely on online data for this adaptation. In this paper, we will use time-varying Bayesian optimization (TVBO) to tune controllers online in changing environments using appropriate prior knowledge on the control objective and its changes. Two properties are characteristic of many online controller tuning problems: First, they exhibit incremental and lasting changes in the objective due to changes to the system dynamics, e.g., through wear and tear. Second, the optimization problem is convex in the tuning parameters. Current TVBO methods do not explicitly account for these properties, resulting in poor tuning performance and many unstable controllers through over-exploration of the parameter space. We propose a novel TVBO forgetting strategy using Uncertainty-Injection (UI), which incorporates the assumption of incremental and lasting changes. The control objective is modeled as a spatio-temporal Gaussian process (GP) with UI through a Wiener process in the temporal domain. Further, we explicitly model the convexity assumptions in the spatial dimension through GP models with linear inequality constraints. In numerical experiments, we show that our model outperforms the state-of-the-art method in TVBO, exhibiting reduced regret and fewer unstable parameter configurations.

* To appear in the proceedings of the 61st IEEE Conference on Decision and Control

Via

Access Paper or Ask Questions