Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Boris Houska

Operator Models for Continuous-Time Offline Reinforcement Learning

Nov 13, 2025

Nicolas Hoischen, Petar Bevanda, Max Beier, Stefan Sosnowski, Boris Houska, Sandra Hirche

Abstract:Continuous-time stochastic processes underlie many natural and engineered systems. In healthcare, autonomous driving, and industrial control, direct interaction with the environment is often unsafe or impractical, motivating offline reinforcement learning from historical data. However, there is limited statistical understanding of the approximation errors inherent in learning policies from offline datasets. We address this by linking reinforcement learning to the Hamilton-Jacobi-Bellman equation and proposing an operator-theoretic algorithm based on a simple dynamic programming recursion. Specifically, we represent our world model in terms of the infinitesimal generator of controlled diffusion processes learned in a reproducing kernel Hilbert space. By integrating statistical learning methods and operator theory, we establish global convergence of the value function and derive finite-sample guarantees with bounds tied to system properties such as smoothness and stability. Our theoretical and numerical results indicate that operator-based approaches may hold promise in solving offline reinforcement learning using continuous-time optimal control.

Via

Access Paper or Ask Questions

Kernel-Based Optimal Control: An Infinitesimal Generator Approach

Dec 02, 2024

Petar Bevanda, Nicolas Hosichen, Tobias Wittmann, Jan Brüdigam, Sandra Hirche, Boris Houska

Abstract:This paper presents a novel approach for optimal control of nonlinear stochastic systems using infinitesimal generator learning within infinite-dimensional reproducing kernel Hilbert spaces. Our learning framework leverages data samples of system dynamics and stage cost functions, with only control penalties and constraints provided. The proposed method directly learns the diffusion operator of a controlled Fokker-Planck-Kolmogorov equation in an infinite-dimensional hypothesis space. This operator models the continuous-time evolution of the probability measure of the control system's state. We demonstrate that this approach seamlessly integrates with modern convex operator-theoretic Hamilton-Jacobi-Bellman recursions, enabling a data-driven solution to the optimal control problem. Furthermore, our statistical learning framework includes nonparametric estimators for uncontrolled forward infinitesimal generators as a special case. Numerical experiments, ranging from synthetic differential equations to simulated robotic systems, showcase the advantages of our approach compared to both modern data-driven and classical nonlinear programming methods for optimal control.

Via

Access Paper or Ask Questions

Real-Time-Feasible Collision-Free Motion Planning For Ellipsoidal Objects

Sep 18, 2024

Yunfan Gao, Florian Messerer, Niels van Duijkeren, Boris Houska, Moritz Diehl

Figure 1 for Real-Time-Feasible Collision-Free Motion Planning For Ellipsoidal Objects

Figure 2 for Real-Time-Feasible Collision-Free Motion Planning For Ellipsoidal Objects

Figure 3 for Real-Time-Feasible Collision-Free Motion Planning For Ellipsoidal Objects

Figure 4 for Real-Time-Feasible Collision-Free Motion Planning For Ellipsoidal Objects

Abstract:Online planning of collision-free trajectories is a fundamental task for robotics and self-driving car applications. This paper revisits collision avoidance between ellipsoidal objects using differentiable constraints. Two ellipsoids do not overlap if and only if the endpoint of the vector between the center points of the ellipsoids does not lie in the interior of the Minkowski sum of the ellipsoids. This condition is formulated using a parametric over-approximation of the Minkowski sum, which can be made tight in any given direction. The resulting collision avoidance constraint is included in an optimal control problem (OCP) and evaluated in comparison to the separating-hyperplane approach. Not only do we observe that the Minkowski-sum formulation is computationally more efficient in our experiments, but also that using pre-determined over-approximation parameters based on warm-start trajectories leads to a very limited increase in suboptimality. This gives rise to a novel real-time scheme for collision-free motion planning with model predictive control (MPC). Both the real-time feasibility and the effectiveness of the constraint formulation are demonstrated in challenging real-world experiments.

Via

Access Paper or Ask Questions

Data-Driven Optimal Feedback Laws via Kernel Mean Embeddings

Jul 23, 2024

Petar Bevanda, Nicolas Hoischen, Stefan Sosnowski, Sandra Hirche, Boris Houska

Abstract:This paper proposes a fully data-driven approach for optimal control of nonlinear control-affine systems represented by a stochastic diffusion. The focus is on the scenario where both the nonlinear dynamics and stage cost functions are unknown, while only control penalty function and constraints are provided. Leveraging the theory of reproducing kernel Hilbert spaces, we introduce novel kernel mean embeddings (KMEs) to identify the Markov transition operators associated with controlled diffusion processes. The KME learning approach seamlessly integrates with modern convex operator-theoretic Hamilton-Jacobi-Bellman recursions. Thus, unlike traditional dynamic programming methods, our approach exploits the ``kernel trick'' to break the curse of dimensionality. We demonstrate the effectiveness of our method through numerical examples, highlighting its ability to solve a large class of nonlinear optimal control problems.

* author-submitted electronic preprint version: 16 pages, 3 figures, 4 tables

Via

Access Paper or Ask Questions