Abstract:We consider how model-based solvers can be leveraged to guide training of a universal policy to control from any feasible start state to any feasible goal in a contact-rich manipulation setting. While Reinforcement Learning (RL) has demonstrated its strength in such settings, it may struggle to sufficiently explore and discover complex manipulation strategies, especially in sparse-reward settings. Our approach is based on the idea of a lower-dimensional manifold of feasible, likely-visited states during such manipulation and to guide RL with a sampler from this manifold. We propose Sample-Guided RL, which uses model-based constraint solvers to efficiently sample feasible configurations (satisfying differentiable collision, contact, and force constraints) and leverage them to guide RL for universal (goal-conditioned) manipulation policies. We study using this data directly to bias state visitation, as well as using black-box optimization of open-loop trajectories between random configurations to impose a state bias and optionally add a behavior cloning loss. In a minimalistic double sphere manipulation setting, Sample-Guided RL discovers complex manipulation strategies and achieves high success rates in reaching any statically stable state. In a more challenging panda arm setting, our approach achieves a significant success rate over a near-zero baseline, and demonstrates a breadth of complex whole-body-contact manipulation strategies.
Abstract:Intelligent interaction with the real world requires robotic agents to jointly reason over high-level plans and low-level controls. Task and motion planning (TAMP) addresses this by combining symbolic planning and continuous trajectory generation. Recently, foundation model approaches to TAMP have presented impressive results, including fast planning times and the execution of natural language instructions. Yet, the optimal interface between high-level planning and low-level motion generation remains an open question: prior approaches are limited by either too much abstraction (e.g., chaining simplified skill primitives) or a lack thereof (e.g., direct joint angle prediction). Our method introduces a novel technique employing a form of meta-optimization to address these issues by: (i) using program search over trajectory optimization problems as an interface between a foundation model and robot control, and (ii) leveraging a zero-order method to optimize numerical parameters in the foundation model output. Results on challenging object manipulation and drawing tasks confirm that our proposed method improves over prior TAMP approaches.
Abstract:The increasing complexity of multirotor applications has led to the need of more accurate flight controllers that can reliably predict all forces acting on the robot. Traditional flight controllers model a large part of the forces but do not take so called residual forces into account. A reason for this is that accurately computing the residual forces can be computationally expensive. Incremental Nonlinear Dynamic Inversion (INDI) is a method that computes the difference between different sensor measurements in order to estimate these residual forces. The main issue with INDI is it's reliance on special sensor measurements which can be very noisy. Recent work has also shown that residual forces can be predicted using learning-based methods. In this work, we demonstrate that a learning algorithm can predict a smoother version of INDI outputs without requiring additional sensor measurements. In addition, we introduce a new method that combines learning based predictions with INDI. We also adapt the two approaches to work on quadrotors carrying a slung-type payload. The results show that using a neural network to predict residual forces can outperform INDI while using the combination of neural network and INDI can yield even better results than each method individually.