Abstract:Learning robotic manipulation skills from vision is a promising approach for developing robotics applications that can generalize broadly to real-world scenarios. As such, many approaches to enable this vision have been explored with fruitful results. Particularly, object-centric representation methods have been shown to provide better inductive biases for skill learning, leading to improved performance and generalization. Nonetheless, we show that object-centric methods can struggle to learn simple manipulation skills in multi-object environments. Thus, we propose DOCIR, an object-centric framework that introduces a disentangled representation for objects of interest, obstacles, and robot embodiment. We show that this approach leads to state-of-the-art performance for learning pick and place skills from visual inputs in multi-object environments and generalizes at test time to changing objects of interest and distractors in the scene. Furthermore, we show its efficacy both in simulation and zero-shot transfer to the real world.
Abstract:Learning skills that interact with objects is of major importance for robotic manipulation. These skills can indeed serve as an efficient prior for solving various manipulation tasks. We propose a novel Skill Learning approach that discovers composable behaviors by solving a large and diverse number of autonomously generated tasks. Our method learns skills allowing the robot to consistently and robustly interact with objects in its environment. The discovered behaviors are embedded in primitives which can be composed with Hierarchical Reinforcement Learning to solve unseen manipulation tasks. In particular, we leverage Asymmetric Self-Play to discover behaviors and Multiplicative Compositional Policies to embed them. We compare our method to Skill Learning baselines and find that our skills are more interactive. Furthermore, the learned skills can be used to solve a set of unseen manipulation tasks, in simulation as well as on a real robotic platform.
Abstract:Self-supervised skill learning aims to acquire useful behaviors that leverage the underlying dynamics of the environment. Latent variable models, based on mutual information maximization, have been particularly successful in this task but still struggle in the context of robotic manipulation. As it requires impacting a possibly large set of degrees of freedom composing the environment, mutual information maximization fails alone in producing useful manipulation behaviors. To address this limitation, we introduce SLIM, a multi-critic learning approach for skill discovery with a particular focus on robotic manipulation. Our main insight is that utilizing multiple critics in an actor-critic framework to gracefully combine multiple reward functions leads to a significant improvement in latent-variable skill discovery for robotic manipulation while overcoming possible interference occurring among rewards which hinders convergence to useful skills. Furthermore, in the context of tabletop manipulation, we demonstrate the applicability of our novel skill discovery approach to acquire safe and efficient motor primitives in a hierarchical reinforcement learning fashion and leverage them through planning, surpassing the state-of-the-art approaches for skill discovery by a large margin.