Abstract:The creation of large, diverse, high-quality robot manipulation datasets is an important stepping stone on the path toward more capable and robust robotic manipulation policies. However, creating such datasets is challenging: collecting robot manipulation data in diverse environments poses logistical and safety challenges and requires substantial investments in hardware and human labour. As a result, even the most general robot manipulation policies today are mostly trained on data collected in a small number of environments with limited scene and task diversity. In this work, we introduce DROID (Distributed Robot Interaction Dataset), a diverse robot manipulation dataset with 76k demonstration trajectories or 350 hours of interaction data, collected across 564 scenes and 84 tasks by 50 data collectors in North America, Asia, and Europe over the course of 12 months. We demonstrate that training with DROID leads to policies with higher performance and improved generalization ability. We open source the full dataset, policy learning code, and a detailed guide for reproducing our robot hardware setup.
Abstract:It is difficult for robots to retrieve objects in densely cluttered lateral access scenes with movable objects as jamming against adjacent objects and walls can inhibit progress. We propose the use of two action primitives -- burrowing and excavating -- that can fluidize the scene to un-jam obstacles and enable continued progress. Even when these primitives are implemented in an open loop manner at clock-driven intervals, we observe a decrease in the final distance to the target location. Furthermore, we combine the primitives into a closed loop hybrid control strategy using tactile and proprioceptive information to leverage the advantages of both primitives without being overly disruptive. In doing so, we achieve a 10-fold increase in success rate above the baseline control strategy and significantly improve completion times as compared to the primitives alone or a naive combination of them.
Abstract:For a robot to personalize physical assistance effectively, it must learn user preferences that can be generally reapplied to future scenarios. In this work, we investigate personalization of household cleanup with robots that can tidy up rooms by picking up objects and putting them away. A key challenge is determining the proper place to put each object, as people's preferences can vary greatly depending on personal taste or cultural background. For instance, one person may prefer storing shirts in the drawer, while another may prefer them on the shelf. We aim to build systems that can learn such preferences from just a handful of examples via prior interactions with a particular person. We show that robots can combine language-based planning and perception with the few-shot summarization capabilities of large language models (LLMs) to infer generalized user preferences that are broadly applicable to future interactions. This approach enables fast adaptation and achieves 91.2% accuracy on unseen objects in our benchmark dataset. We also demonstrate our approach on a real-world mobile manipulator called TidyBot, which successfully puts away 85.0% of objects in real-world test scenarios.
Abstract:Manipulation of objects in-hand without an object model is a foundational skill for many tasks in unstructured environments. In many cases, vision-only approaches may not be feasible; for example, due to occlusion in cluttered spaces. In this paper, we introduce a method to reorient unknown objects by incrementally building a probabilistic estimate of the object shape and pose during task-driven manipulation. Our method leverages Bayesian optimization to strategically trade-off exploration of the global object shape with efficient task completion. We demonstrate our approach on a Tactile-Enabled Roller Grasper, a gripper that rolls objects in hand while continuously collecting tactile data. We evaluate our method in simulation on a set of randomly generated objects and find that our method reliably reorients objects while significantly reducing the exploration time needed to do so. On the Roller Grasper hardware, we show successful qualitative reconstruction of the object model. In summary, this work (1) presents a system capable of simultaneously learning unknown 3D object shape and pose using tactile sensing; and (2) demonstrates that task-driven exploration results in more efficient object manipulation than the common paradigm of complete object exploration before task-completion.
Abstract:Dexterous manipulation tasks often require contact switching, where fingers make and break contact with the object. We propose a method that plans trajectories for dexterous manipulation tasks involving contact switching using contact-implicit trajectory optimization (CITO) augmented with a high-level discrete contact sequence planner. We first use the high-level planner to find a sequence of finger contact switches given a desired object trajectory. With this contact sequence plan, we impose additional constraints in the CITO problem. We show that our method finds trajectories approximately 7 times faster than a general CITO baseline for a four-finger planar manipulation scenario. Furthermore, when executing the planned trajectories in a full dynamics simulator, we are able to more closely track the object pose trajectories planned by our method than those planned by the baselines.