Abstract:Large Language Models (LLMs) have recently shown promise as high-level planners for robots when given access to a selection of low-level skills. However, it is often assumed that LLMs do not possess sufficient knowledge to be used for the low-level trajectories themselves. In this work, we address this assumption thoroughly, and investigate if an LLM (GPT-4) can directly predict a dense sequence of end-effector poses for manipulation skills, when given access to only object detection and segmentation vision models. We study how well a single task-agnostic prompt, without any in-context examples, motion primitives, or external trajectory optimisers, can perform across 26 real-world language-based tasks, such as "open the bottle cap" and "wipe the plate with the sponge", and we investigate which design choices in this prompt are the most effective. Our conclusions raise the assumed limit of LLMs for robotics, and we reveal for the first time that LLMs do indeed possess an understanding of low-level robot control sufficient for a range of common tasks, and that they can additionally detect failures and then re-plan trajectories accordingly. Videos, code, and prompts are available at: https://www.robot-learning.uk/language-models-trajectory-generators.
Abstract:Warning: this paper contains material which may be offensive or upsetting. While much of recent work has focused on the detection of hate speech and overtly offensive content, very little research has explored the more subtle but equally harmful language in the form of implied stereotypes. This is a challenging domain, made even more so by the fact that humans often struggle to understand and reason about stereotypes. We build on existing literature and present CO-STAR (COnceptualisation of STereotypes for Analysis and Reasoning), a novel framework which encodes the underlying concepts of implied stereotypes. We also introduce the CO-STAR training data set, which contains just over 12K structured annotations of implied stereotypes and stereotype conceptualisations, and achieve state-of-the-art results after training and manual evaluation. The CO-STAR models are, however, limited in their ability to understand more complex and subtly worded stereotypes, and our research motivates future work in developing models with more sophisticated methods for encoding common-sense knowledge.