Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pavel Osinenko

Towards a constructive framework for control theory

Jan 04, 2025

Pavel Osinenko

Abstract:This work presents a framework for control theory based on constructive analysis to account for discrepancy between mathematical results and their implementation in a computer, also referred to as computational uncertainty. In control engineering, the latter is usually either neglected or considered submerged into some other type of uncertainty, such as system noise, and addressed within robust control. However, even robust control methods may be compromised when the mathematical objects involved in the respective algorithms fail to exist in exact form and subsequently fail to satisfy the required properties. For instance, in general stabilization using a control Lyapunov function, computational uncertainty may distort stability certificates or even destabilize the system despite robustness of the stabilization routine with regards to system, actuator and measurement noise. In fact, battling numerical problems in practical implementation of controllers is common among control engineers. Such observations indicate that computational uncertainty should indeed be addressed explicitly in controller synthesis and system analysis. The major contribution here is a fairly general framework for proof techniques in analysis and synthesis of control systems based on constructive analysis which explicitly states that every computation be doable only up to a finite precision thus accounting for computational uncertainty. A series of previous works is overviewed, including constructive system stability and stabilization, approximate optimal controls, eigenvalue problems, Caratheodory trajectories, measurable selectors. Additionally, a new constructive version of the Danskin's theorem, which is crucial in adversarial defense, is presented.

* in IEEE Control Systems Letters, vol. 6, pp. 379-384, 2022
* Published under: https://ieeexplore.ieee.org/document/9419858

Via

Access Paper or Ask Questions

Optimizing energy consumption for legged robot by adapting equilibrium position and stiffness of a parallel torsion spring

Nov 27, 2024

Danil Belov, Artem Erkhov, Farit Khabibullin, Elisaveta Pestova, Sergei Satsevich, Ilya Osokin, Pavel Osinenko, Dzmitry Tsetserukou

Abstract:This paper is dedicated to the development of a novel adaptive torsion spring mechanism for optimizing energy consumption in legged robots. By adjusting the equilibrium position and stiffness of the spring, the system improves energy efficiency during cyclic movements, such as walking and jumping. The adaptive compliance mechanism, consisting of a torsion spring combined with a worm gear driven by a servo actuator, compensates for motion-induced torque and reduces motor load. Simulation results demonstrate a significant reduction in power consumption, highlighting the effectiveness of this approach in enhancing robotic locomotion.

Via

Access Paper or Ask Questions

Critic as Lyapunov function (CALF): a model-free, stability-ensuring agent

Sep 15, 2024

Pavel Osinenko, Grigory Yaremenko, Roman Zashchitin, Anton Bolychev, Sinan Ibrahim, Dmitrii Dobriborsci

Figure 1 for Critic as Lyapunov function (CALF): a model-free, stability-ensuring agent

Figure 2 for Critic as Lyapunov function (CALF): a model-free, stability-ensuring agent

Figure 3 for Critic as Lyapunov function (CALF): a model-free, stability-ensuring agent

Abstract:This work presents and showcases a novel reinforcement learning agent called Critic As Lyapunov Function (CALF) which is model-free and ensures online environment, in other words, dynamical system stabilization. Online means that in each learning episode, the said environment is stabilized. This, as demonstrated in a case study with a mobile robot simulator, greatly improves the overall learning performance. The base actor-critic scheme of CALF is analogous to SARSA. The latter did not show any success in reaching the target in our studies. However, a modified version thereof, called SARSA-m here, did succeed in some learning scenarios. Still, CALF greatly outperformed the said approach. CALF was also demonstrated to improve a nominal stabilizer provided to it. In summary, the presented agent may be considered a viable approach to fusing classical control with reinforcement learning. Its concurrent approaches are mostly either offline or model-based, like, for instance, those that fuse model-predictive control into the agent.

* IEEE Conference on Decision and Control. Accepted for publication in proceedings of the conference

Via

Access Paper or Ask Questions

An approach to improve agent learning via guaranteeing goal reaching in all episodes

May 29, 2024

Pavel Osinenko, Grigory Yaremenko, Georgiy Malaniya, Anton Bolychev

Abstract:Reinforcement learning is commonly concerned with problems of maximizing accumulated rewards in Markov decision processes. Oftentimes, a certain goal state or a subset of the state space attain maximal reward. In such a case, the environment may be considered solved when the goal is reached. Whereas numerous techniques, learning or non-learning based, exist for solving environments, doing so optimally is the biggest challenge. Say, one may choose a reward rate which penalizes the action effort. Reinforcement learning is currently among the most actively developed frameworks for solving environments optimally by virtue of maximizing accumulated reward, in other words, returns. Yet, tuning agents is a notoriously hard task as reported in a series of works. Our aim here is to help the agent learn a near-optimal policy efficiently while ensuring a goal reaching property of some basis policy that merely solves the environment. We suggest an algorithm, which is fairly flexible, and can be used to augment practically any agent as long as it comprises of a critic. A formal proof of a goal reaching property is provided. Simulation experiments on six problems under five agents, including the benchmarked one, provided an empirical evidence that the learning can indeed be boosted while ensuring goal reaching property.

Via

Access Paper or Ask Questions

Combining model-predictive control and predictive reinforcement learning for stable quadrupedal robot locomotion

Jul 15, 2023

Vyacheslav Kovalev, Anna Shkromada, Henni Ouerdane, Pavel Osinenko

Abstract:Stable gait generation is a crucial problem for legged robot locomotion as this impacts other critical performance factors such as, e.g. mobility over an uneven terrain and power consumption. Gait generation stability results from the efficient control of the interaction between the legged robot's body and the environment where it moves. Here, we study how this can be achieved by a combination of model-predictive and predictive reinforcement learning controllers. Model-predictive control (MPC) is a well-established method that does not utilize any online learning (except for some adaptive variations) as it provides a convenient interface for state constraints management. Reinforcement learning (RL), in contrast, relies on adaptation based on pure experience. In its bare-bone variants, RL is not always suitable for robots due to their high complexity and expensive simulation/experimentation. In this work, we combine both control methods to address the quadrupedal robot stable gate generation problem. The hybrid approach that we develop and apply uses a cost roll-out algorithm with a tail cost in the form of a Q-function modeled by a neural network; this allows to alleviate the computational complexity, which grows exponentially with the prediction horizon in a purely MPC approach. We demonstrate that our RL gait controller achieves stable locomotion at short horizons, where a nominal MP controller fails. Further, our controller is capable of live operation, meaning that it does not require previous training. Our results suggest that the hybridization of MPC with RL, as presented here, is beneficial to achieve a good balance between online control capabilities and computational complexity.

Via

Access Paper or Ask Questions

Experimental verification of an online traction parameter identification method

Mar 20, 2023

Alexander Kobelski, Pavel Osinenko, Stefan Streif

Abstract:Traction parameters, that characterize the ground-wheel contact dynamics, are the central factor in the energy efficiency of vehicles. To optimize fuel consumption, reduce wear of tires, increase productivity etc., knowledge of current traction parameters is unavoidable. Unfortunately, these parameters are difficult to measure and require expensive force and torque sensors. An alternative way is to use system identification to determine them. In this work, we validate such a method in field experiments with a mobile robot. The method is based on an adaptive Kalman filter. We show how it estimates the traction parameters online, during the motion on the field, and compare them to their values determined via a 6-directional force-torque sensor installed for verification. Data of adhesion slip ratio curves is recorded and compared to curves from literature for additional validation of the method. The results can establish a foundation for a number of optimal traction methods.

* Control Engineering Practice Volume 113, August 2021, 104837

Via

Access Paper or Ask Questions

A stabilizing reinforcement learning approach for sampled systems with partially unknown models

Aug 31, 2022

Lukas Beckenbach, Pavel Osinenko, Stefan Streif

Figure 1 for A stabilizing reinforcement learning approach for sampled systems with partially unknown models

Figure 2 for A stabilizing reinforcement learning approach for sampled systems with partially unknown models

Figure 3 for A stabilizing reinforcement learning approach for sampled systems with partially unknown models

Figure 4 for A stabilizing reinforcement learning approach for sampled systems with partially unknown models

Abstract:Reinforcement learning is commonly associated with training of reward-maximizing (or cost-minimizing) agents, in other words, controllers. It can be applied in model-free or model-based fashion, using a priori or online collected system data to train involved parametric architectures. In general, online reinforcement learning does not guarantee closed loop stability unless special measures are taken, for instance, through learning constraints or tailored training rules. Particularly promising are hybrids of reinforcement learning with "classical" control approaches. In this work, we suggest a method to guarantee practical stability of the system-controller closed loop in a purely online learning setting, i.e., without offline training. Moreover, we assume only partial knowledge of the system model. To achieve the claimed results, we employ techniques of classical adaptive control. The implementation of the overall control scheme is provided explicitly in a digital, sampled setting. That is, the controller receives the state of the system and computes the control action at discrete, specifically, equidistant moments in time. The method is tested in adaptive traction control and cruise control where it proved to significantly reduce the cost.

Via

Access Paper or Ask Questions

On stabilizing reinforcement learning without Lyapunov functions

Jul 18, 2022

Pavel Osinenko, Grigory Yaremenko, Georgiy Malaniya

Figure 1 for On stabilizing reinforcement learning without Lyapunov functions

Figure 2 for On stabilizing reinforcement learning without Lyapunov functions

Figure 3 for On stabilizing reinforcement learning without Lyapunov functions

Figure 4 for On stabilizing reinforcement learning without Lyapunov functions

Abstract:Reinforcement learning remains one of the major directions of the contemporary development of control engineering and machine learning. Nice intuition, flexible settings, ease of application are among the many perks of this methodology. From the standpoint of machine learning, the main strength of a reinforcement learning agent is its ability to ``capture" (learn) the optimal behavior in the given environment. Typically, the agent is built on neural networks and it is their approximation abilities that give rise to the above belief. From the standpoint of control engineering, however, reinforcement learning has serious deficiencies. The most significant one is the lack of stability guarantee of the agent-environment closed loop. A great deal of research was and is being made towards stabilizing reinforcement learning. Speaking of stability, the celebrated Lyapunov theory is the de facto tool. It is thus no wonder that so many techniques of stabilizing reinforcement learning rely on the Lyapunov theory in one way or another. In control theory, there is an intricate connection between a stabilizing controller and a Lyapunov function. Employing such a pair seems thus quite attractive to design stabilizing reinforcement learning. However, computation of a Lyapunov function is generally a cumbersome process. In this note, we show how to construct a stabilizing reinforcement learning agent that does not employ such a function at all. We only assume that a Lyapunov function exists, which is a natural thing to do if the given system (read: environment) is stabilizable, but we do not need to compute one.

Via

Access Paper or Ask Questions

On stochastic stabilization via non-smooth control Lyapunov functions

May 26, 2022

Pavel Osinenko, Grigory Yaremenko, Georgiy Malaniya

Figure 1 for On stochastic stabilization via non-smooth control Lyapunov functions

Abstract:Control Lyapunov function is a central tool in stabilization. It generalizes an abstract energy function -- a Lyapunov function -- to the case of controlled systems. It is a known fact that most control Lyapunov functions are non-smooth -- so is the case in non-holonomic systems, like wheeled robots and cars. Frameworks for stabilization using non-smooth control Lyapunov functions exist, like Dini aiming and steepest descent. This work generalizes the related results to the stochastic case. As the groundwork, sampled control scheme is chosen in which control actions are computed at discrete moments in time using discrete measurements of the system state. In such a setup, special attention should be paid to the sample-to-sample behavior of the control Lyapunov function. A particular challenge here is a random noise acting on the system. The central result of this work is a theorem that states, roughly, that if there is a, generally non-smooth, control Lyapunov function, the given stochastic dynamical system can be practically stabilized in the sample-and-hold mode meaning that the control actions are held constant within sampling time steps. A particular control method chosen is based on Moreau-Yosida regularization, in other words, inf-convolution of the control Lyapunov function, but the overall framework is extendable to further control schemes. It is assumed that the system noise be bounded almost surely, although the case of unbounded noise is briefly addressed.

* IEEE Transactions on Automatic Control

Via

Access Paper or Ask Questions

A comment on stabilizing reinforcement learning

Nov 24, 2021

Pavel Osinenko, Georgiy Malaniya, Grigory Yaremenko, Ilya Osokin

Abstract:This is a short comment on the paper "Asymptotically Stable Adaptive-Optimal Control Algorithm With Saturating Actuators and Relaxed Persistence of Excitation" by Vamvoudakis et al. The question of stability of reinforcement learning (RL) agents remains hard and the said work suggested an on-policy approach with a suitable stability property using a technique from adaptive control - a robustifying term to be added to the action. However, there is an issue with this approach to stabilizing RL, which we will explain in this note. Furthermore, Vamvoudakis et al. seems to have made a fallacious assumption on the Hamiltonian under a generic policy. To provide a positive result, we will not only indicate this mistake, but show critic neural network weight convergence under a stochastic, continuous-time environment, provided certain conditions on the behavior policy hold.

Via

Access Paper or Ask Questions