Abstract:The combination of behavioural cloning and neural networks has driven significant progress in robotic manipulation. As these algorithms may require a large number of demonstrations for each task of interest, they remain fundamentally inefficient in complex scenarios. This issue is aggravated when the system is treated as a black-box, ignoring its physical properties. This work characterises widespread properties of robotic manipulation, such as pose equivariance and locality. We empirically demonstrate that transformations arising from each of these properties allow neural policies trained with behavioural cloning to better generalise to out-of-distribution problem instances.
Abstract:Collecting manipulation demonstrations with robotic hardware is tedious - and thus difficult to scale. Recording data on robot hardware ensures that it is in the appropriate format for Learning from Demonstrations (LfD) methods. By contrast, humans are proficient manipulators, and recording their actions would be easy to scale, but it is challenging to use that data format with LfD methods. The question we explore is whether there is a method to collect data in a format that can be used with LfD while retaining some of the attractive features of recording human manipulation. We propose equipping humans with hand-held, hand-actuated parallel grippers and a head-mounted camera to record demonstrations of manipulation tasks. Using customised and reproducible grippers, we collect an initial dataset of common manipulation tasks. We show that there are tasks that, against our initial intuition, can be performed using parallel grippers. Qualitative insights are obtained regarding the impact of the difference in morphology on LfD by comparing the strategies used to complete tasks with human hands and grippers. Our data collection method bridges the gap between robot- and human-native manipulation demonstration. By making the design of our gripper prototype available, we hope to reduce other researchers effort to collect manipulation data.