Abstract:Loco-manipulation calls for effective whole-body control and contact-rich interactions with the object and the environment. Existing learning-based control frameworks rely on task-specific engineered rewards, training a set of low-level skill policies and explicitly switching between them with a high-level policy or FSM, leading to quasi-static and fragile transitions between skills. In contrast, for solving highly dynamic tasks such as soccer, the robot should run towards the ball, decelerating into an optimal approach configuration to seamlessly switch to dribbling and eventually score a goal - a continuum of smooth motion. To this end, we propose to learn a single Oracle Guided Multi-mode Policy (OGMP) for mastering all the required modes and transition maneuvers to solve uni-object bipedal loco-manipulation tasks. Specifically, we design a multi-mode oracle as a closed loop state-reference generator, viewing it as a hybrid automaton with continuous reference generating dynamics and discrete mode jumps. Given such an oracle, we then train an OGMP through bounded exploration around the generated reference. Furthermore, to enforce the policy to learn the desired sequence of mode transitions, we present a novel task-agnostic mode-switching preference reward that enhances performance. The proposed approach results in successful dynamic loco-manipulation in omnidirectional soccer and box-moving tasks with a 16-DoF bipedal robot HECTOR. Supplementary video results are available at https://www.youtube.com/watch?v=gfDaRqobheg
Abstract:Amidst task-specific learning-based control synthesis frameworks that achieve impressive empirical results, a unified framework that systematically constructs an optimal policy for sufficiently solving a general notion of a task is absent. Hence, we propose a theoretical framework for a task-centered control synthesis leveraging two critical ideas: 1) oracle-guided policy optimization for the non-limiting integration of sub-optimal task-based priors to guide the policy optimization and 2) task-vital multimodality to break down solving a task into executing a sequence of behavioral modes. The proposed approach results in highly agile parkour and diving on a 16-DoF dynamic bipedal robot. The obtained policy advances indefinitely on a track, performing leaps and jumps of varying lengths and heights for the parkour task. Corresponding to the dive task, the policy demonstrates front, back, and side flips from various initial heights. Finally, we introduce a novel latent mode space reachability analysis to study our policies' versatility and generalization by computing a feasible mode set function through which we certify a set of failure-free modes for our policy to perform at any given state.