Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Eric Rosen

Verifiably Following Complex Robot Instructions with Foundation Models

Feb 18, 2024

Benedict Quartey, Eric Rosen, Stefanie Tellex, George Konidaris

Figure 1 for Verifiably Following Complex Robot Instructions with Foundation Models

Figure 2 for Verifiably Following Complex Robot Instructions with Foundation Models

Figure 3 for Verifiably Following Complex Robot Instructions with Foundation Models

Figure 4 for Verifiably Following Complex Robot Instructions with Foundation Models

Abstract:Enabling robots to follow complex natural language instructions is an important yet challenging problem. People want to flexibly express constraints, refer to arbitrary landmarks and verify behavior when instructing robots. Conversely, robots must disambiguate human instructions into specifications and ground instruction referents in the real world. We propose Language Instruction grounding for Motion Planning (LIMP), a system that leverages foundation models and temporal logics to generate instruction-conditioned semantic maps that enable robots to verifiably follow expressive and long-horizon instructions with open vocabulary referents and complex spatiotemporal constraints. In contrast to prior methods for using foundation models in robot task execution, LIMP constructs an explainable instruction representation that reveals the robot's alignment with an instructor's intended motives and affords the synthesis of robot behaviors that are correct-by-construction. We demonstrate LIMP in three real-world environments, across a set of 35 complex spatiotemporal instructions, showing the generality of our approach and the ease of deployment in novel unstructured domains. In our experiments, LIMP can spatially ground open-vocabulary referents and synthesize constraint-satisfying plans in 90% of object-goal navigation and 71% of mobile manipulation instructions. See supplementary videos at https://robotlimp.github.io

Via

Access Paper or Ask Questions

Language-Conditioned Observation Models for Visual Object Search

Sep 13, 2023

Thao Nguyen, Vladislav Hrosinkov, Eric Rosen, Stefanie Tellex

Figure 1 for Language-Conditioned Observation Models for Visual Object Search

Figure 2 for Language-Conditioned Observation Models for Visual Object Search

Figure 3 for Language-Conditioned Observation Models for Visual Object Search

Figure 4 for Language-Conditioned Observation Models for Visual Object Search

Abstract:Object search is a challenging task because when given complex language descriptions (e.g., "find the white cup on the table"), the robot must move its camera through the environment and recognize the described object. Previous works map language descriptions to a set of fixed object detectors with predetermined noise models, but these approaches are challenging to scale because new detectors need to be made for each object. In this work, we bridge the gap in realistic object search by posing the search problem as a partially observable Markov decision process (POMDP) where the object detector and visual sensor noise in the observation model is determined by a single Deep Neural Network conditioned on complex language descriptions. We incorporate the neural network's outputs into our language-conditioned observation model (LCOM) to represent dynamically changing sensor noise. With an LCOM, any language description of an object can be used to generate an appropriate object detector and noise model, and training an LCOM only requires readily available supervised image-caption datasets. We empirically evaluate our method by comparing against a state-of-the-art object search algorithm in simulation, and demonstrate that planning with our observation model yields a significantly higher average task completion rate (from 0.46 to 0.66) and more efficient and quicker object search than with a fixed-noise model. We demonstrate our method on a Boston Dynamics Spot robot, enabling it to handle complex natural language object descriptions and efficiently find objects in a room-scale environment.

Via

Access Paper or Ask Questions

A Virtual Reality Teleoperation Interface for Industrial Robot Manipulators

May 18, 2023

Eric Rosen, Devesh K. Jha

Abstract:We address the problem of teleoperating an industrial robot manipulator via a commercially available Virtual Reality (VR) interface. Previous works on VR teleoperation for robot manipulators focus primarily on collaborative or research robot platforms (whose dynamics and constraints differ from industrial robot arms), or only address tasks where the robot's dynamics are not as important (e.g: pick and place tasks). We investigate the usage of commercially available VR interfaces for effectively teleoeprating industrial robot manipulators in a variety of contact-rich manipulation tasks. We find that applying standard practices for VR control of robot arms is challenging for industrial platforms because torque and velocity control is not exposed, and position control is mediated through a black-box controller. To mitigate these problems, we propose a simplified filtering approach to process command signals to enable operators to effectively teleoperate industrial robot arms with VR interfaces in dexterous manipulation tasks. We hope our findings will help robot practitioners implement and setup effective VR teleoperation interfaces for robot manipulators. The proposed method is demonstrated on a variety of contact-rich manipulation tasks which can also involve very precise movement of the robot during execution (videos can be found at https://www.youtube.com/watch?v=OhkCB9mOaBc)

* 7 pages, 6 figures

Via

Access Paper or Ask Questions

Planning with Large Language Models via Corrective Re-prompting

Nov 17, 2022

Shreyas Sundara Raman, Vanya Cohen, Eric Rosen, Ifrah Idrees, David Paulius, Stefanie Tellex

Figure 1 for Planning with Large Language Models via Corrective Re-prompting

Figure 2 for Planning with Large Language Models via Corrective Re-prompting

Figure 3 for Planning with Large Language Models via Corrective Re-prompting

Figure 4 for Planning with Large Language Models via Corrective Re-prompting

Abstract:Extracting the common sense knowledge present in Large Language Models (LLMs) offers a path to designing intelligent, embodied agents. Related works have queried LLMs with a wide-range of contextual information, such as goals, sensor observations and scene descriptions, to generate high-level action plans for specific tasks; however these approaches often involve human intervention or additional machinery to enable sensor-motor interactions. In this work, we propose a prompting-based strategy for extracting executable plans from an LLM, which leverages a novel and readily-accessible source of information: precondition errors. Our approach assumes that actions are only afforded execution in certain contexts, i.e., implicit preconditions must be met for an action to execute (e.g., a door must be unlocked to open it), and that the embodied agent has the ability to determine if the action is/is not executable in the current context (e.g., detect if a precondition error is present). When an agent is unable to execute an action, our approach re-prompts the LLM with precondition error information to extract an executable corrective action to achieve the intended goal in the current context. We evaluate our approach in the VirtualHome simulation environment on 88 different tasks and 7 scenes. We evaluate different prompt templates and compare to methods that naively re-sample actions from the LLM. Our approach, using precondition errors, improves executability and semantic correctness of plans, while also reducing the number of re-prompts required when querying actions.

* 21 pages, 7 figures, Accepted to Foundation Models for Decision Making Workshop at Neural Information Processing Systems 2022

Via

Access Paper or Ask Questions

Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages

Aug 11, 2022

Paul Soulos, Sudha Rao, Caitlin Smith, Eric Rosen, Asli Celikyilmaz, R. Thomas McCoy, Yichen Jiang, Coleman Haley, Roland Fernandez, Hamid Palangi(+2 more)

Figure 1 for Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages

Figure 2 for Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages

Figure 3 for Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages

Figure 4 for Structural Biases for Improving Transformers on Translation into Morphologically Rich Languages

Abstract:Machine translation has seen rapid progress with the advent of Transformer-based models. These models have no explicit linguistic structure built into them, yet they may still implicitly learn structured relationships by attending to relevant tokens. We hypothesize that this structural learning could be made more robust by explicitly endowing Transformers with a structural bias, and we investigate two methods for building in such a bias. One method, the TP-Transformer, augments the traditional Transformer architecture to include an additional component to represent structure. The second method imbues structure at the data level by segmenting the data with morphological tokenization. We test these methods on translating from English into morphologically rich languages, Turkish and Inuktitut, and consider both automatic metrics and human evaluations. We find that each of these two approaches allows the network to achieve better performance, but this improvement is dependent on the size of the dataset. In sum, structural encoding methods make Transformers more sample-efficient, enabling them to perform better from smaller amounts of data.

* Proceedings of the 4th Workshop on Technologies for MT of Low Resource Languages (LoResMT2021)
* Revised edition to 4th Workshop on Technologies for MT of Low Resource Languages

Via

Access Paper or Ask Questions

CHARM: A Hierarchical Deep Learning Model for Classification of Complex Human Activities Using Motion Sensors

Jul 16, 2022

Eric Rosen, Doruk Senkal

Figure 1 for CHARM: A Hierarchical Deep Learning Model for Classification of Complex Human Activities Using Motion Sensors

Figure 2 for CHARM: A Hierarchical Deep Learning Model for Classification of Complex Human Activities Using Motion Sensors

Figure 3 for CHARM: A Hierarchical Deep Learning Model for Classification of Complex Human Activities Using Motion Sensors

Figure 4 for CHARM: A Hierarchical Deep Learning Model for Classification of Complex Human Activities Using Motion Sensors

Abstract:In this paper, we report a hierarchical deep learning model for classification of complex human activities using motion sensors. In contrast to traditional Human Activity Recognition (HAR) models used for event-based activity recognition, such as step counting, fall detection, and gesture identification, this new deep learning model, which we refer to as CHARM (Complex Human Activity Recognition Model), is aimed for recognition of high-level human activities that are composed of multiple different low-level activities in a non-deterministic sequence, such as meal preparation, house chores, and daily routines. CHARM not only quantitatively outperforms state-of-the-art supervised learning approaches for high-level activity recognition in terms of average accuracy and F1 scores, but also automatically learns to recognize low-level activities, such as manipulation gestures and locomotion modes, without any explicit labels for such activities. This opens new avenues for Human-Machine Interaction (HMI) modalities using wearable sensors, where the user can choose to associate an automated task with a high-level activity, such as controlling home automation (e.g., robotic vacuum cleaners, lights, and thermostats) or presenting contextually relevant information at the right time (e.g., reminders, status updates, and weather/news reports). In addition, the ability to learn low-level user activities when trained using only high-level activity labels may pave the way to semi-supervised learning of HAR tasks that are inherently difficult to label.

* 8 pages, 5 figures

Via

Access Paper or Ask Questions

Skill Transfer for Temporally-Extended Task Specifications

Jun 10, 2022

Jason Xinyu Liu, Ankit Shah, Eric Rosen, George Konidaris, Stefanie Tellex

Figure 1 for Skill Transfer for Temporally-Extended Task Specifications

Figure 2 for Skill Transfer for Temporally-Extended Task Specifications

Figure 3 for Skill Transfer for Temporally-Extended Task Specifications

Figure 4 for Skill Transfer for Temporally-Extended Task Specifications

Abstract:Deploying robots in real-world domains, such as households and flexible manufacturing lines, requires the robots to be taskable on demand. Linear temporal logic (LTL) is a widely-used specification language with a compositional grammar that naturally induces commonalities across tasks. However, the majority of prior research on reinforcement learning with LTL specifications treats every new formula independently. We propose LTL-Transfer, a novel algorithm that enables subpolicy reuse across tasks by segmenting policies for training tasks into portable transition-centric skills capable of satisfying a wide array of unseen LTL specifications while respecting safety-critical constraints. Our experiments in a Minecraft-inspired domain demonstrate the capability of LTL-Transfer to satisfy over 90% of 500 unseen tasks while training on only 50 task specifications and never violating a safety constraint. We also deployed LTL-Transfer on a quadruped mobile manipulator in a household environment to show its ability to transfer to many fetch and delivery tasks in a zero-shot fashion.

Via

Access Paper or Ask Questions

Learning robot motor skills with mixed reality

Mar 21, 2022

Eric Rosen, Sreehari Rammohan, Devesh Jha

Figure 1 for Learning robot motor skills with mixed reality

Figure 2 for Learning robot motor skills with mixed reality

Figure 3 for Learning robot motor skills with mixed reality

Figure 4 for Learning robot motor skills with mixed reality

Abstract:Mixed Reality (MR) has recently shown great success as an intuitive interface for enabling end-users to teach robots. Related works have used MR interfaces to communicate robot intents and beliefs to a co-located human, as well as developed algorithms for taking multi-modal human input and learning complex motor behaviors. Even with these successes, enabling end-users to teach robots complex motor tasks still poses a challenge because end-user communication is highly task dependent and world knowledge is highly varied. We propose a learning framework where end-users teach robots a) motion demonstrations, b) task constraints, c) planning representations, and d) object information, all of which are integrated into a single motor skill learning framework based on Dynamic Movement Primitives (DMPs). We hypothesize that conveying this world knowledge will be intuitive with an MR interface, and that a sample-efficient motor skill learning framework which incorporates varied modalities of world knowledge will enable robots to effectively solve complex tasks.

* VAM-HRI 2022

Via

Access Paper or Ask Questions

Scalable knowledge base completion with superposition memories

Oct 24, 2021

Matthias Lalisse, Eric Rosen, Paul Smolensky

Figure 1 for Scalable knowledge base completion with superposition memories

Figure 2 for Scalable knowledge base completion with superposition memories

Figure 3 for Scalable knowledge base completion with superposition memories

Figure 4 for Scalable knowledge base completion with superposition memories

Abstract:We present Harmonic Memory Networks (HMem), a neural architecture for knowledge base completion that models entities as weighted sums of pairwise bindings between an entity's neighbors and corresponding relations. Since entities are modeled as aggregated neighborhoods, representations of unseen entities can be generated on the fly. We demonstrate this with two new datasets: WNGen and FBGen. Experiments show that the model is SOTA on benchmarks, and flexible enough to evolve without retraining as the knowledge graph grows.

Via

Access Paper or Ask Questions

TOKCS: Tool for Organizing Key Characteristics of VAM-HRI Systems

Aug 07, 2021

Thomas R. Groechel, Michael E. Walker, Christine T. Chang, Eric Rosen, Jessica Zosa Forde

Figure 1 for TOKCS: Tool for Organizing Key Characteristics of VAM-HRI Systems

Figure 2 for TOKCS: Tool for Organizing Key Characteristics of VAM-HRI Systems

Figure 3 for TOKCS: Tool for Organizing Key Characteristics of VAM-HRI Systems

Figure 4 for TOKCS: Tool for Organizing Key Characteristics of VAM-HRI Systems

Abstract:Frameworks have begun to emerge to categorize Virtual, Augmented, and Mixed Reality (VAM) technologies that provide immersive, intuitive interfaces to facilitate Human-Robot Interaction. These frameworks, however, fail to capture key characteristics of the growing subfield of VAM-HRI and can be difficult to consistently apply. This work builds upon these prior frameworks through the creation of a Tool for Organizing Key Characteristics of VAM-HRI Systems (TOKCS). TOKCS discretizes the continuous scales used within prior works for more consistent classification and adds additional characteristics related to a robot's internal model, anchor locations, manipulability, and the system's software and hardware. To showcase the tool's capability, TOKCS is applied to find trends and takeaways from the fourth VAM-HRI workshop. These trends highlight the expressive capability of TOKCS while also helping frame newer trends and future work recommendations for VAM-HRI research.

Via

Access Paper or Ask Questions