Abstract:Deep Reinforcement Learning (RL) has emerged as a promising method to develop humanoid robot locomotion controllers. Despite the robust and stable locomotion demonstrated by previous RL controllers, their behavior often lacks the natural and agile motion patterns necessary for human-centric scenarios. In this work, we propose HiLo (human-like locomotion with motion tracking), an effective framework designed to learn RL policies that perform human-like locomotion. The primary challenges of human-like locomotion are complex reward engineering and domain randomization. HiLo overcomes these issues by developing an RL-based motion tracking controller and simple domain randomization through random force injection and action delay. Within the framework of HiLo, the whole-body control problem can be decomposed into two components: One part is solved using an open-loop control method, while the residual part is addressed with RL policies. A distributional value function is also implemented to stabilize the training process by improving the estimation of cumulative rewards under perturbed dynamics. Our experiments demonstrate that the motion tracking controller trained using HiLo can perform natural and agile human-like locomotion while exhibiting resilience to external disturbances in real-world systems. Furthermore, we show that the motion patterns of humanoid robots can be adapted through the residual mechanism without fine-tuning, allowing quick adjustments to task requirements.
Abstract:In Reinforcement Learning (RL), agents aim at maximizing cumulative rewards in a given environment. During the learning process, RL agents face the dilemma of exploitation and exploration: leveraging existing knowledge to acquire rewards or seeking potentially higher ones. Using uncertainty as a guiding principle provides an active and effective approach to solving this dilemma and ensemble-based methods are one of the prominent avenues for quantifying uncertainty. Nevertheless, conventional ensemble-based uncertainty estimation lacks an explicit prior, deviating from Bayesian principles. Besides, this method requires diversity among members to generate less biased uncertainty estimation results. To address the above problems, previous research has incorporated random functions as priors. Building upon these foundational efforts, our work introduces an innovative approach with delicately designed prior NNs, which can incorporate maximal diversity in the initial value functions of RL. Our method has demonstrated superior performance compared with the random prior approaches in solving classic control problems and general exploration tasks, significantly improving sample efficiency.