Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mingguo Zhao

Booster Gym: An End-to-End Reinforcement Learning Framework for Humanoid Robot Locomotion

Jun 18, 2025

Yushi Wang, Penghui Chen, Xinyu Han, Feng Wu, Mingguo Zhao

Abstract:Recent advancements in reinforcement learning (RL) have led to significant progress in humanoid robot locomotion, simplifying the design and training of motion policies in simulation. However, the numerous implementation details make transferring these policies to real-world robots a challenging task. To address this, we have developed a comprehensive code framework that covers the entire process from training to deployment, incorporating common RL training methods, domain randomization, reward function design, and solutions for handling parallel structures. This library is made available as a community resource, with detailed descriptions of its design and experimental results. We validate the framework on the Booster T1 robot, demonstrating that the trained policies seamlessly transfer to the physical platform, enabling capabilities such as omnidirectional walking, disturbance resistance, and terrain adaptability. We hope this work provides a convenient tool for the robotics community, accelerating the development of humanoid robots. The code can be found in https://github.com/BoosterRobotics/booster_gym.

Via

Access Paper or Ask Questions

HiFAR: Multi-Stage Curriculum Learning for High-Dynamics Humanoid Fall Recovery

Feb 28, 2025

Penghui Chen, Yushi Wang, Changsheng Luo, Wenhan Cai, Mingguo Zhao

Abstract:Humanoid robots encounter considerable difficulties in autonomously recovering from falls, especially within dynamic and unstructured environments. Conventional control methodologies are often inadequate in addressing the complexities associated with high-dimensional dynamics and the contact-rich nature of fall recovery. Meanwhile, reinforcement learning techniques are hindered by issues related to sparse rewards, intricate collision scenarios, and discrepancies between simulation and real-world applications. In this study, we introduce a multi-stage curriculum learning framework, termed HiFAR. This framework employs a staged learning approach that progressively incorporates increasingly complex and high-dimensional recovery tasks, thereby facilitating the robot's acquisition of efficient and stable fall recovery strategies. Furthermore, it enables the robot to adapt its policy to effectively manage real-world fall incidents. We assess the efficacy of the proposed method using a real humanoid robot, showcasing its capability to autonomously recover from a diverse range of falls with high success rates, rapid recovery times, robustness, and generalization.

Via

Access Paper or Ask Questions

Robust Quadrupedal Locomotion via Risk-Averse Policy Learning

Sep 01, 2023

Jiyuan Shi, Chenjia Bai, Haoran He, Lei Han, Dong Wang, Bin Zhao, Mingguo Zhao, Xiu Li, Xuelong Li

Abstract:The robustness of legged locomotion is crucial for quadrupedal robots in challenging terrains. Recently, Reinforcement Learning (RL) has shown promising results in legged locomotion and various methods try to integrate privileged distillation, scene modeling, and external sensors to improve the generalization and robustness of locomotion policies. However, these methods are hard to handle uncertain scenarios such as abrupt terrain changes or unexpected external forces. In this paper, we consider a novel risk-sensitive perspective to enhance the robustness of legged locomotion. Specifically, we employ a distributional value function learned by quantile regression to model the aleatoric uncertainty of environments, and perform risk-averse policy learning by optimizing the worst-case scenarios via a risk distortion measure. Extensive experiments in both simulation environments and a real Aliengo robot demonstrate that our method is efficient in handling various external disturbances, and the resulting policy exhibits improved robustness in harsh and uncertain situations in legged locomotion. Videos are available at https://risk-averse-locomotion.github.io/.

* 8 pages, 5 figures

Via

Access Paper or Ask Questions

Prioritized Hierarchical Compliance Control for Dual-Arm Robot Stable Clamping

Dec 20, 2021

Xiaoyu Ren, Liqun Huang, Mingguo Zhao

Figure 1 for Prioritized Hierarchical Compliance Control for Dual-Arm Robot Stable Clamping

Figure 2 for Prioritized Hierarchical Compliance Control for Dual-Arm Robot Stable Clamping

Figure 3 for Prioritized Hierarchical Compliance Control for Dual-Arm Robot Stable Clamping

Figure 4 for Prioritized Hierarchical Compliance Control for Dual-Arm Robot Stable Clamping

Abstract:When a dual-arm robot clamps a rigid object in an environment for human beings, the environment or the collaborating human will impose incidental disturbance on the operated object or the robot arm, leading to clamping failure, damaging the robot even hurting the human. This research proposes a prioritized hierarchical compliance control to simultaneously deal with the two types of disturbances in the dual-arm robot clamping. First, we use hierarchical quadratic programming (HQP) to solve the robot inverse kinematics under the joint constraints and prioritize the compliance for the disturbance on the object over that on the robot arm. Second, we estimate the disturbance forces throughout the momentum observer with the F/T sensors and adopt admittance control to realize the compliances. Finally, we perform the verify experiments on a 14-DOF position-controlled dual-arm robot WalkerX, clamping a rigid object stably while realizing the compliance against the disturbances.

* 7 pages, 7 figures, accepted by IEEE ROBIO 2021

Via

Access Paper or Ask Questions

Recursive Hierarchical Projection for Whole-Body Control with Task Priority Transition

Sep 22, 2021

Gang Han, Jiajun Wang, Xiaozhu Ju, Mingguo Zhao

Figure 1 for Recursive Hierarchical Projection for Whole-Body Control with Task Priority Transition

Figure 2 for Recursive Hierarchical Projection for Whole-Body Control with Task Priority Transition

Figure 3 for Recursive Hierarchical Projection for Whole-Body Control with Task Priority Transition

Figure 4 for Recursive Hierarchical Projection for Whole-Body Control with Task Priority Transition

Abstract:Redundant robots are desired to execute multitasks with different priorities simultaneously. The task priorities are necessary to be transitioned for complex task scheduling of whole-body control (WBC). Many methods focused on guaranteeing the control continuity during task priority transition, however either increased the computation consumption or sacrificed the accuracy of tasks inevitably. This work formulates the WBC problem with task priority transition as an Hierarchical Quadratic Programming (HQP) with Recursive Hierarchical Projection (RHP) matrices. The tasks of each level are solved recursively through HQP. We propose the RHP matrix to form the continuously changing projection of each level so that the task priority transition is achieved without increasing computation consumption. Additionally, the recursive approach solves the WBC problem without losing the accuracy of tasks. We verify the effectiveness of this scheme by the comparative simulations of the reactive collision avoidance through multi-tasks priority transitions.

* 6 pages, 9 figures, submitted to ICRA 2022

Via

Access Paper or Ask Questions

Mixed Control for Whole-Body Compliance of a Humanoid Robot

Sep 16, 2021

Xiaozhu Ju, Jiajun Wang, Gang Han, Mingguo Zhao

Figure 1 for Mixed Control for Whole-Body Compliance of a Humanoid Robot

Figure 2 for Mixed Control for Whole-Body Compliance of a Humanoid Robot

Figure 3 for Mixed Control for Whole-Body Compliance of a Humanoid Robot

Figure 4 for Mixed Control for Whole-Body Compliance of a Humanoid Robot

Abstract:The hierarchical quadratic programming (HQP) is commonly applied to consider strict hierarchies of multi-tasks and robot's physical inequality constraints during whole-body compliance. However, for the one-step HQP, the solution can oscillate when it is close to the boundary of constraints. It is because the abrupt hit of the bounds gives rise to unrealisable jerks and even infeasible solutions. This paper proposes the mixed control, which blends the single-axis model predictive control (MPC) and proportional derivate (PD) control for the whole-body compliance to overcome these deficiencies. The MPC predicts the distances between the bounds and the control target of the critical tasks, and it provides smooth and feasible solutions by prediction and optimisation in advance. However, applying MPC will inevitably increase the computation time. Therefore, to achieve a 500 Hz servo rate, the PD controllers still regulate other tasks to save computation resources. Also, we use a more efficient null space projection (NSP) whole-body controller instead of the HQP and distribute the single-axis MPCs into four CPU cores for parallel computation. Finally, we validate the desired capabilities of the proposed strategy via Simulations and the experiment on the humanoid robot Walker X.

* 6 pages, 5 figures, submitted to ICRA 2022

Via

Access Paper or Ask Questions

Whole-Body Control with Motion/Force Transmissibility for Parallel-Legged Robot

Sep 15, 2021

Jiajun Wang, Gang Han, Xiaozhu Ju, Mingguo Zhao

Figure 1 for Whole-Body Control with Motion/Force Transmissibility for Parallel-Legged Robot

Figure 2 for Whole-Body Control with Motion/Force Transmissibility for Parallel-Legged Robot

Figure 3 for Whole-Body Control with Motion/Force Transmissibility for Parallel-Legged Robot

Figure 4 for Whole-Body Control with Motion/Force Transmissibility for Parallel-Legged Robot

Abstract:Whole-body control (WBC) has been applied to the locomotion of legged robots. However, current WBC methods have not considered the intrinsic features of parallel mechanisms, especially motion/force transmissibility (MFT). In this work, we propose an MFT-enhanced WBC scheme. Introducing MFT into a WBC is challenging due to the nonlinear relationship between MFT indices and the robot configuration. To overcome this challenge, we establish the MFT preferable space of the robot and formulate it as a polyhedron in the joint space at the acceleration level. Then, the WBC employs the polyhedron as a soft constraint. As a result, the robot possesses high-speed and high-acceleration capabilities by satisfying this constraint as well as staying away from its singularity. In contrast with the WBC without considering MFT, our proposed scheme is more robust to external disturbances, e.g., push recovery and uneven terrain locomotion. simulations and experiments on a parallel-legged bipedal robot are provided to demonstrate the performance and robustness of the proposed method.

* 6 pages, 7 figures, submitted to ICRA 2022

Via

Access Paper or Ask Questions

Dynamic Balancing of Humanoid Robot Walker3 with Proprioceptive Actuation: Systematic Design of Algorithm, Software and Hardware

Aug 09, 2021

Yan Xie, Jiajun Wang, Hao Dong, Xiaoyu Ren, Liqun Huang, Mingguo Zhao

Figure 1 for Dynamic Balancing of Humanoid Robot Walker3 with Proprioceptive Actuation: Systematic Design of Algorithm, Software and Hardware

Figure 2 for Dynamic Balancing of Humanoid Robot Walker3 with Proprioceptive Actuation: Systematic Design of Algorithm, Software and Hardware

Figure 3 for Dynamic Balancing of Humanoid Robot Walker3 with Proprioceptive Actuation: Systematic Design of Algorithm, Software and Hardware

Figure 4 for Dynamic Balancing of Humanoid Robot Walker3 with Proprioceptive Actuation: Systematic Design of Algorithm, Software and Hardware

Abstract:Dynamic balancing under uncertain disturbances is important for a humanoid robot, which requires a good capability of coordinating the entire body redundancy to execute multi tasks. Whole-body control (WBC) based on hierarchical optimization has been generally accepted and utilized in torque-controlled robots. A good hierarchy is the prerequisite for WBC and can be predefined according to prior knowledge. However, the real-time computation would be problematic in the physical applications considering the computational complexity of WBC. For robots with proprioceptive actuation, the joint friction in gear reducer would also degrade the torque tracking performance. In our paper, a reasonable hierarchy of tasks and constraints is first customized for robot dynamic balancing. Then a real-time WBC is implemented via a computationally efficient WBC software. Such a method is solved on a modular master control system UBTMaster characterized by the real-time communication and powerful computing capability. After the joint friction being well covered by the model identification, extensive experiments on various balancing scenarios are conducted on a humanoid Walker3 with proprioceptive actuation. The robot shows an outstanding balance performance even under external impulses as well as the two feet of the robot suffering the inclination and shift disturbances independently. The results demonstrate that with the strict hierarchy, real-time computation and joint friction being handled carefully, the robot with proprioceptive actuation can manage the dynamic physical interactions with the unstructured environments well.

* journal

Via

Access Paper or Ask Questions

Fast Online Planning for Bipedal Locomotion via Centroidal Model Predictive Gait Synthesis

Feb 26, 2021

Yijie Guo, Mingguo Zhao

Figure 1 for Fast Online Planning for Bipedal Locomotion via Centroidal Model Predictive Gait Synthesis

Figure 2 for Fast Online Planning for Bipedal Locomotion via Centroidal Model Predictive Gait Synthesis

Figure 3 for Fast Online Planning for Bipedal Locomotion via Centroidal Model Predictive Gait Synthesis

Figure 4 for Fast Online Planning for Bipedal Locomotion via Centroidal Model Predictive Gait Synthesis

Abstract:The planning of whole-body motion and step time for bipedal locomotion is constructed as a model predictive control (MPC) problem, in which a sequence of optimization problems need to be solved online. While directly solving these problems is extremely time-consuming, we propose a predictive gait synthesizer to solve them quickly online. Based on the full dimensional model, a library of gaits with different speeds and periods is first constructed offline. Then the proposed gait synthesizer generates real-time gaits by synthesizing the gait library based on the online prediction of centroidal dynamics. We prove that the generated gaits are feasible solutions of the MPC optimization problems. Thus our proposed gait synthesizer works as a fast MPC-style planner to guarantee the feasibility and stability of the full dimensional robot. Simulation and experimental results on an 8 degrees of freedom (DoF) bipedal robot are provided to show the performance and robustness of this approach for walking and standing.

* Submitted to the IEEE for possible publication. Comments are welocme

Via

Access Paper or Ask Questions

Brain-inspired global-local hybrid learning towards human-like intelligence

Jun 05, 2020

Yujie Wu, Rong Zhao, Jun Zhu, Feng Chen, Mingkun Xu, Guoqi Li, Sen Song, Lei Deng, Guanrui Wang, Hao Zheng(+4 more)

Figure 1 for Brain-inspired global-local hybrid learning towards human-like intelligence

Figure 2 for Brain-inspired global-local hybrid learning towards human-like intelligence

Figure 3 for Brain-inspired global-local hybrid learning towards human-like intelligence

Figure 4 for Brain-inspired global-local hybrid learning towards human-like intelligence

Abstract:The combination of neuroscience-oriented and computer-science-oriented approaches is the most promising method to develop artificial general intelligence (AGI) that can learn general tasks similar to humans. Currently, two main routes of learning exist, including neuroscience-inspired methods, represented by local synaptic plasticity, and machine-learning methods, represented by backpropagation. Both have advantages and complement each other, but neither can solve all learning problems well. Integrating these two methods into one network may provide better learning abilities for general tasks. Here, we report a hybrid spiking neural network model that integrates the two approaches by introducing a meta-local module and a two-phase causality modelling method. The model can not only optimize local plasticity rules, but also receive top-down supervision information. In addition to flexibly supporting multiple spike-based coding schemes, we demonstrate that this model facilitates learning of many general tasks, including fault-tolerance learning, few-shot learning and multiple-task learning, and show its efficiency on the Tianjic neuromorphic platform. This work provides a new route for brain-inspired computing and facilitates AGI development.

* 5 figures, 2 tables

Via

Access Paper or Ask Questions