Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Norikazu Sugimoto

Goal-Conditioned Terminal Value Estimation for Real-time and Multi-task Model Predictive Control

Oct 07, 2024

Mitsuki Morita, Satoshi Yamamori, Satoshi Yagi, Norikazu Sugimoto, Jun Morimoto

Abstract:While MPC enables nonlinear feedback control by solving an optimal control problem at each timestep, the computational burden tends to be significantly large, making it difficult to optimize a policy within the control period. To address this issue, one possible approach is to utilize terminal value learning to reduce computational costs. However, the learned value cannot be used for other tasks in situations where the task dynamically changes in the original MPC setup. In this study, we develop an MPC framework with goal-conditioned terminal value learning to achieve multitask policy optimization while reducing computational time. Furthermore, by using a hierarchical control structure that allows the upper-level trajectory planner to output appropriate goal-conditioned trajectories, we demonstrate that a robot model is able to generate diverse motions. We evaluate the proposed method on a bipedal inverted pendulum robot model and confirm that combining goal-conditioned terminal value learning with an upper-level trajectory planner enables real-time control; thus, the robot successfully tracks a target trajectory on sloped terrain.

* 16 pages, 9 figures

Via

Access Paper or Ask Questions

Efficient Reuse of Previous Experiences to Improve Policies in Real Environment

May 10, 2014

Norikazu Sugimoto, Voot Tangkaratt, Thijs Wensveen, Tingting Zhao, Masashi Sugiyama, Jun Morimoto

Figure 1 for Efficient Reuse of Previous Experiences to Improve Policies in Real Environment

Figure 2 for Efficient Reuse of Previous Experiences to Improve Policies in Real Environment

Figure 3 for Efficient Reuse of Previous Experiences to Improve Policies in Real Environment

Figure 4 for Efficient Reuse of Previous Experiences to Improve Policies in Real Environment

Abstract:In this study, we show that a movement policy can be improved efficiently using the previous experiences of a real robot. Reinforcement Learning (RL) is becoming a popular approach to acquire a nonlinear optimal policy through trial and error. However, it is considered very difficult to apply RL to real robot control since it usually requires many learning trials. Such trials cannot be executed in real environments because unrealistic time is necessary and the real system's durability is limited. Therefore, in this study, instead of executing many learning trials, we propose to use a recently developed RL algorithm, importance-weighted PGPE, by which the robot can efficiently reuse previously sampled data to improve it's policy parameters. We apply importance-weighted PGPE to CB-i, our real humanoid robot, and show that it can learn a target reaching movement and a cart-pole swing up movement in a real environment without using any prior knowledge of the task or any carefully designed initial trajectory.

* 14 pages, 9 figures, http://www.cns.atr.jp/bri/en/. arXiv admin note: substantial text overlap with arXiv:1301.3966

Via

Access Paper or Ask Questions