Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Johannes A. Stork

Inverse Optimization Latent Variable Models for Learning Costs Applied to Route Problems

Sep 19, 2025

Alan A. Lahoud, Erik Schaffernicht, Johannes A. Stork

Abstract:Learning representations for solutions of constrained optimization problems (COPs) with unknown cost functions is challenging, as models like (Variational) Autoencoders struggle to enforce constraints when decoding structured outputs. We propose an Inverse Optimization Latent Variable Model (IO-LVM) that learns a latent space of COP cost functions from observed solutions and reconstructs feasible outputs by solving a COP with a solver in the loop. Our approach leverages estimated gradients of a Fenchel-Young loss through a non-differentiable deterministic solver to shape the latent space. Unlike standard Inverse Optimization or Inverse Reinforcement Learning methods, which typically recover a single or context-specific cost function, IO-LVM captures a distribution over cost functions, enabling the identification of diverse solution behaviors arising from different agents or conditions not available during the training process. We validate our method on real-world datasets of ship and taxi routes, as well as paths in synthetic graphs, demonstrating its ability to reconstruct paths and cycles, predict their distributions, and yield interpretable latent representations.

* Accepted at Neurips 2025

Via

Access Paper or Ask Questions

ZipMPC: Compressed Context-Dependent MPC Cost via Imitation Learning

Jul 17, 2025

Rahel Rickenbach, Alan A. Lahoud, Erik Schaffernicht, Melanie N. Zeilinger, Johannes A. Stork

Abstract:The computational burden of model predictive control (MPC) limits its application on real-time systems, such as robots, and often requires the use of short prediction horizons. This not only affects the control performance, but also increases the difficulty of designing MPC cost functions that reflect the desired long-term objective. This paper proposes ZipMPC, a method that imitates a long-horizon MPC behaviour by learning a compressed and context-dependent cost function for a short-horizon MPC. It improves performance over alternative methods, such as approximate explicit MPC and automatic cost parameter tuning, in particular in terms of i) optimizing the long term objective; ii) maintaining computational costs comparable to a short-horizon MPC; iii) ensuring constraint satisfaction; and iv) generalizing control behaviour to environments not observed during training. For this purpose, ZipMPC leverages the concept of differentiable MPC with neural networks to propagate gradients of the imitation loss through the MPC optimization. We validate our proposed method in simulation and real-world experiments on autonomous racing. ZipMPC consistently completes laps faster than selected baselines, achieving lap times close to the long-horizon MPC baseline. In challenging scenarios where the short-horizon MPC baseline fails to complete a lap, ZipMPC is able to do so. In particular, these performance gains are also observed on tracks unseen during training.

Via

Access Paper or Ask Questions

On the Fly Adaptation of Behavior Tree-Based Policies through Reinforcement Learning

Mar 08, 2025

Marco Iannotta, Johannes A. Stork, Erik Schaffernicht, Todor Stoyanov

Abstract:With the rising demand for flexible manufacturing, robots are increasingly expected to operate in dynamic environments where local -- such as slight offsets or size differences in workpieces -- are common. We propose to address the problem of adapting robot behaviors to these task variations with a sample-efficient hierarchical reinforcement learning approach adapting Behavior Tree (BT)-based policies. We maintain the core BT properties as an interpretable, modular framework for structuring reactive behaviors, but extend their use beyond static tasks by inherently accommodating local task variations. To show the efficiency and effectiveness of our approach, we conduct experiments both in simulation and on a Franka Emika Panda 7-DoF, with the manipulator adapting to different obstacle avoidance and pivoting tasks.

Via

Access Paper or Ask Questions

THÖR-MAGNI Act: Actions for Human Motion Modeling in Robot-Shared Industrial Spaces

Dec 18, 2024

Tiago Rodrigues de Almeida, Tim Schreiter, Andrey Rudenko, Luigi Palmieiri, Johannes A. Stork, Achim J. Lilienthal

Figure 1 for THÖR-MAGNI Act: Actions for Human Motion Modeling in Robot-Shared Industrial Spaces

Figure 2 for THÖR-MAGNI Act: Actions for Human Motion Modeling in Robot-Shared Industrial Spaces

Figure 3 for THÖR-MAGNI Act: Actions for Human Motion Modeling in Robot-Shared Industrial Spaces

Figure 4 for THÖR-MAGNI Act: Actions for Human Motion Modeling in Robot-Shared Industrial Spaces

Abstract:Accurate human activity and trajectory prediction are crucial for ensuring safe and reliable human-robot interactions in dynamic environments, such as industrial settings, with mobile robots. Datasets with fine-grained action labels for moving people in industrial environments with mobile robots are scarce, as most existing datasets focus on social navigation in public spaces. This paper introduces the TH\"OR-MAGNI Act dataset, a substantial extension of the TH\"OR-MAGNI dataset, which captures participant movements alongside robots in diverse semantic and spatial contexts. TH\"OR-MAGNI Act provides 8.3 hours of manually labeled participant actions derived from egocentric videos recorded via eye-tracking glasses. These actions, aligned with the provided TH\"OR-MAGNI motion cues, follow a long-tailed distribution with diversified acceleration, velocity, and navigation distance profiles. We demonstrate the utility of TH\"OR-MAGNI Act for two tasks: action-conditioned trajectory prediction and joint action and trajectory prediction. We propose two efficient transformer-based models that outperform the baselines to address these tasks. These results underscore the potential of TH\"OR-MAGNI Act to develop predictive models for enhanced human-robot interaction in complex environments.

* This paper has been accepted to the the 20th edition of the IEEE/ACM International Conference on Human-Robot Interaction (HRI'25), which will be held in Melbourne, Australia on March 4-6, 2025. Code: https://github.com/tmralmeida/thor-magni-actions

Via

Access Paper or Ask Questions

Learning Solutions of Stochastic Optimization Problems with Bayesian Neural Networks

Jun 05, 2024

Alan A. Lahoud, Erik Schaffernicht, Johannes A. Stork

Figure 1 for Learning Solutions of Stochastic Optimization Problems with Bayesian Neural Networks

Figure 2 for Learning Solutions of Stochastic Optimization Problems with Bayesian Neural Networks

Figure 3 for Learning Solutions of Stochastic Optimization Problems with Bayesian Neural Networks

Figure 4 for Learning Solutions of Stochastic Optimization Problems with Bayesian Neural Networks

Abstract:Mathematical solvers use parametrized Optimization Problems (OPs) as inputs to yield optimal decisions. In many real-world settings, some of these parameters are unknown or uncertain. Recent research focuses on predicting the value of these unknown parameters using available contextual features, aiming to decrease decision regret by adopting end-to-end learning approaches. However, these approaches disregard prediction uncertainty and therefore make the mathematical solver susceptible to provide erroneous decisions in case of low-confidence predictions. We propose a novel framework that models prediction uncertainty with Bayesian Neural Networks (BNNs) and propagates this uncertainty into the mathematical solver with a Stochastic Programming technique. The differentiable nature of BNNs and differentiable mathematical solvers allow for two different learning approaches: In the Decoupled learning approach, we update the BNN weights to increase the quality of the predictions' distribution of the OP parameters, while in the Combined learning approach, we update the weights aiming to directly minimize the expected OP's cost function in a stochastic end-to-end fashion. We do an extensive evaluation using synthetic data with various noise properties and a real dataset, showing that decisions regret are generally lower (better) with both proposed methods.

Via

Access Paper or Ask Questions

DataSP: A Differential All-to-All Shortest Path Algorithm for Learning Costs and Predicting Paths with Context

May 08, 2024

Alan A. Lahoud, Erik Schaffernicht, Johannes A. Stork

Figure 1 for DataSP: A Differential All-to-All Shortest Path Algorithm for Learning Costs and Predicting Paths with Context

Figure 2 for DataSP: A Differential All-to-All Shortest Path Algorithm for Learning Costs and Predicting Paths with Context

Figure 3 for DataSP: A Differential All-to-All Shortest Path Algorithm for Learning Costs and Predicting Paths with Context

Figure 4 for DataSP: A Differential All-to-All Shortest Path Algorithm for Learning Costs and Predicting Paths with Context

Abstract:Learning latent costs of transitions on graphs from trajectories demonstrations under various contextual features is challenging but useful for path planning. Yet, existing methods either oversimplify cost assumptions or scale poorly with the number of observed trajectories. This paper introduces DataSP, a differentiable all-to-all shortest path algorithm to facilitate learning latent costs from trajectories. It allows to learn from a large number of trajectories in each learning step without additional computation. Complex latent cost functions from contextual features can be represented in the algorithm through a neural network approximation. We further propose a method to sample paths from DataSP in order to reconstruct/mimic observed paths' distributions. We prove that the inferred distribution follows the maximum entropy principle. We show that DataSP outperforms state-of-the-art differentiable combinatorial solver and classical machine learning approaches in predicting paths on graphs.

Via

Access Paper or Ask Questions

Learning Extrinsic Dexterity with Parameterized Manipulation Primitives

Nov 02, 2023

Shih-Min Yang, Martin Magnusson, Johannes A. Stork, Todor Stoyanov

Figure 1 for Learning Extrinsic Dexterity with Parameterized Manipulation Primitives

Figure 2 for Learning Extrinsic Dexterity with Parameterized Manipulation Primitives

Figure 3 for Learning Extrinsic Dexterity with Parameterized Manipulation Primitives

Figure 4 for Learning Extrinsic Dexterity with Parameterized Manipulation Primitives

Abstract:Many practically relevant robot grasping problems feature a target object for which all grasps are occluded, e.g., by the environment. Single-shot grasp planning invariably fails in such scenarios. Instead, it is necessary to first manipulate the object into a configuration that affords a grasp. We solve this problem by learning a sequence of actions that utilize the environment to change the object's pose. Concretely, we employ hierarchical reinforcement learning to combine a sequence of learned parameterized manipulation primitives. By learning the low-level manipulation policies, our approach can control the object's state through exploiting interactions between the object, the gripper, and the environment. Designing such a complex behavior analytically would be infeasible under uncontrolled conditions, as an analytic approach requires accurate physical modeling of the interaction and contact dynamics. In contrast, we learn a hierarchical policy model that operates directly on depth perception data, without the need for object detection, pose estimation, or manual design of controllers. We evaluate our approach on picking box-shaped objects of various weight, shape, and friction properties from a constrained table-top workspace. Our method transfers to a real robot and is able to successfully complete the object picking task in 98\% of experimental trials.

Via

Access Paper or Ask Questions

Heterogeneous Full-body Control of a Mobile Manipulator with Behavior Trees

Oct 16, 2022

Marco Iannotta, David Cáceres Domínguez, Johannes A. Stork, Erik Schaffernicht, Todor Stoyanov

Figure 1 for Heterogeneous Full-body Control of a Mobile Manipulator with Behavior Trees

Figure 2 for Heterogeneous Full-body Control of a Mobile Manipulator with Behavior Trees

Figure 3 for Heterogeneous Full-body Control of a Mobile Manipulator with Behavior Trees

Figure 4 for Heterogeneous Full-body Control of a Mobile Manipulator with Behavior Trees

Abstract:Integrating the heterogeneous controllers of a complex mechanical system, such as a mobile manipulator, within the same structure and in a modular way is still challenging. In this work we extend our framework based on Behavior Trees for the control of a redundant mechanical system to the problem of commanding more complex systems that involve multiple low-level controllers. This allows the integrated systems to achieve non-trivial goals that require coordination among the sub-systems.

* arXiv admin note: substantial text overlap with arXiv:2209.08619

Via

Access Paper or Ask Questions

Towards Task-Prioritized Policy Composition

Sep 20, 2022

Finn Rietz, Erik Schaffernicht, Todor Stoyanov, Johannes A. Stork

Figure 1 for Towards Task-Prioritized Policy Composition

Figure 2 for Towards Task-Prioritized Policy Composition

Abstract:Combining learned policies in a prioritized, ordered manner is desirable because it allows for modular design and facilitates data reuse through knowledge transfer. In control theory, prioritized composition is realized by null-space control, where low-priority control actions are projected into the null-space of high-priority control actions. Such a method is currently unavailable for Reinforcement Learning. We propose a novel, task-prioritized composition framework for Reinforcement Learning, which involves a novel concept: The indifferent-space of Reinforcement Learning policies. Our framework has the potential to facilitate knowledge transfer and modular design while greatly increasing data efficiency and data reuse for Reinforcement Learning agents. Further, our approach can ensure high-priority constraint satisfaction, which makes it promising for learning in safety-critical domains like robotics. Unlike null-space control, our approach allows learning globally optimal policies for the compound task by online learning in the indifference-space of higher-level policies after initial compound policy construction.

Via

Access Paper or Ask Questions

Transferring Knowledge for Reinforcement Learning in Contact-Rich Manipulation

Sep 19, 2022

Quantao Yang, Johannes A. Stork, Todor Stoyanov

Figure 1 for Transferring Knowledge for Reinforcement Learning in Contact-Rich Manipulation

Figure 2 for Transferring Knowledge for Reinforcement Learning in Contact-Rich Manipulation

Abstract:In manufacturing, assembly tasks have been a challenge for learning algorithms due to variant dynamics of different environments. Reinforcement learning (RL) is a promising framework to automatically learn these tasks, yet it is still not easy to apply a learned policy or skill, that is the ability of solving a task, to a similar environment even if the deployment conditions are only slightly different. In this paper, we address the challenge of transferring knowledge within a family of similar tasks by leveraging multiple skill priors. We propose to learn prior distribution over the specific skill required to accomplish each task and compose the family of skill priors to guide learning the policy for a new task by comparing the similarity between the target task and the prior ones. Our method learns a latent action space representing the skill embedding from demonstrated trajectories for each prior task. We have evaluated our method on a set of peg-in-hole insertion tasks and demonstrate better generalization to new tasks that have never been encountered during training.

Via

Access Paper or Ask Questions