Abstract:Leveraging generative AI (for example, Large Language Models) for language understanding within robotics opens up possibilities for LLM-driven robot end-user development (EUD). Despite the numerous design opportunities it provides, little is understood about how this technology can be utilized when constructing robot program logic. In this paper, we outline the background in capturing natural language end-user intent and summarize previous use cases of LLMs within EUD. Taking the context of filmmaking as an example, we explore how a cinematography practitioner's intent to film a certain scene can be articulated using natural language, captured by an LLM, and further parametrized as low-level robot arm movement. We explore the capabilities of an LLM interpreting end-user intent and mapping natural language to predefined, cross-modal data in the process of iterative program development. We conclude by suggesting future opportunities for domain exploration beyond cinematography to support language-driven robotic camera navigation.
Abstract:Robots in real-world environments continuously engage with multiple users and encounter changes that lead to unexpected conflicts in fulfilling user requests. Recent technical advancements (e.g., large-language models (LLMs), program synthesis) offer various methods for automatically generating repair plans that address such conflicts. In this work, we understand how automated repair and explanations can be designed to improve user experience with robot failures through two user studies. In our first, online study ($n=162$), users expressed increased trust, satisfaction, and utility with the robot performing automated repair and explanations. However, we also identified risk factors -- safety, privacy, and complexity -- that require adaptive repair strategies. The second, in-person study ($n=24$) elucidated distinct repair and explanation strategies depending on the level of risk severity and type. Using a design-based approach, we explore automated repair with explanations as a solution for robots to handle conflicts and failures, complemented by adaptive strategies for risk factors. Finally, we discuss the implications of incorporating such strategies into robot designs to achieve seamless operation among changing user needs and environments.
Abstract:Teleoperation systems map operator commands from an input device into some coordinate frame in the remote environment. This frame, which we call a control coordinate system, should be carefully chosen as it determines how operators should move to get desired robot motions. While specific choices made by individual systems have been described in prior work, a design space, i.e., an abstraction that encapsulates the range of possible options, has not been codified. In this paper, we articulate a design space of control coordinate systems, which can be defined by choosing a direction in the remote environment for each axis of the input device. Our key insight is that there is a small set of meaningful directions in the remote environment. Control coordinate systems in prior works can be organized by the alignments of their axes with these directions and new control coordinate systems can be designed by choosing from these directions. We also provide three design criteria to reason about the suitability of control coordinate systems for various scenarios. To demonstrate the utility of our design space, we use it to organize prior systems and design control coordinate systems for three scenarios that we assess through human-subject experiments. Our results highlight the promise of our design space as a conceptual tool to assist system designers to design control coordinate systems that are effective and intuitive for operators.
Abstract:We investigate how robotic camera systems can offer new capabilities to computer-supported cooperative work through the design, development, and evaluation of a prototype system called Periscope. With Periscope, a local worker completes manipulation tasks with guidance from a remote helper who observes the workspace through a camera mounted on a semi-autonomous robotic arm that is co-located with the worker. Our key insight is that the helper, the worker, and the robot should all share responsibility of the camera view-an approach we call shared camera control. Using this approach, we present a set of modes that distribute the control of the camera between the human collaborators and the autonomous robot depending on task needs. We demonstrate the system's utility and the promise of shared camera control through a preliminary study where 12 dyads collaboratively worked on assembly tasks and discuss design and research implications of our work for future robotic camera system that facilitate remote collaboration.
Abstract:Robotic technology can support the creation of new tools that improve the creative process of cinematography. It is crucial to consider the specific requirements and perspectives of industry professionals when designing and developing these tools. In this paper, we present the results from exploratory interviews with three cinematography practitioners, which included a demonstration of a prototype robotic system. We identified many factors that can impact the design, adoption, and use of robotic support for cinematography, including: (1) the ability to meet requirements for cost, quality, mobility, creativity, and reliability; (2) the compatibility and integration of tools with existing workflows, equipment, and software; and (3) the potential for new creative opportunities that robotic technology can open up. Our findings provide a starting point for future co-design projects that aim to support the work of cinematographers with collaborative robots.
Abstract:Generating feasible robot motions in real-time requires achieving multiple tasks (i.e., kinematic requirements) simultaneously. These tasks can have a specific goal, a range of equally valid goals, or a range of acceptable goals with a preference toward a specific goal. To satisfy multiple and potentially competing tasks simultaneously, it is important to exploit the flexibility afforded by tasks with a range of goals. In this paper, we propose a real-time motion generation method that accommodates all three categories of tasks within a single, unified framework and leverages the flexibility of tasks with a range of goals to accommodate other tasks. Our method incorporates tasks in a weighted-sum multiple-objective optimization structure and uses barrier methods with novel loss functions to encode the valid range of a task. We demonstrate the effectiveness of our method through a simulation experiment that compares it to state-of-the-art alternative approaches, and by demonstrating it on a physical camera-in-hand robot that shows that our method enables the robot to achieve smooth and feasible camera motions.
Abstract:Drones can provide a minimally-constrained adapting camera view to support robot telemanipulation. Furthermore, the drone view can be automated to reduce the burden on the operator during teleoperation. However, existing approaches do not focus on two important aspects of using a drone as an automated view provider. The first is how the drone should select from a range of quality viewpoints within the workspace (e.g., opposite sides of an object). The second is how to compensate for unavoidable drone pose uncertainty in determining the viewpoint. In this paper, we provide a nonlinear optimization method that yields effective and adaptive drone viewpoints for telemanipulation with an articulated manipulator. Our first key idea is to use sparse human-in-the-loop input to toggle between multiple automatically-generated drone viewpoints. Our second key idea is to introduce optimization objectives that maintain a view of the manipulator while considering drone uncertainty and the impact on viewpoint occlusion and environment collisions. We provide an instantiation of our drone viewpoint method within a drone-manipulator remote teleoperation system. Finally, we provide an initial validation of our method in tasks where we complete common household and industrial manipulations.
Abstract:Human demonstrations are important in a range of robotics applications, and are created with a variety of input methods. However, the design space for these input methods has not been extensively studied. In this paper, focusing on demonstrations of hand-scale object manipulation tasks to robot arms with two-finger grippers, we identify distinct usage paradigms in robotics that utilize human-to-robot demonstrations, extract abstract features that form a design space for input methods, and characterize existing input methods as well as a novel input method that we introduce, the instrumented tongs. We detail the design specifications for our method and present a user study that compares it against three common input methods: free-hand manipulation, kinesthetic guidance, and teleoperation. Study results show that instrumented tongs provide high quality demonstrations and a positive experience for the demonstrator while offering good correspondence to the target robot.
Abstract:Non-contact estimation of respiratory pattern (RP) and respiration rate (RR) has multiple applications. Existing methods for RP and RR measurement fall into one of the three categories - (i) estimation through nasal air flow measurement, (ii) estimation from video-based remote photoplethysmography, and (iii) estimation by measurement of motion induced by respiration using motion detectors. These methods, however, require specialized sensors, are computationally expensive and/or critically depend on selection of a region of interest (ROI) for processing. In this paper a general framework is described for estimating a periodic signal driving noisy LTI channels connected in parallel with unknown dynamics. The method is then applied to derive a computationally inexpensive method for estimating RP using 2D cameras that does not critically depend on ROI. Specifically, RP is estimated by imaging the changes in the reflected light caused by respiration-induced motion. Each spatial location in the field of view of the camera is modeled as a noise-corrupted linear time-invariant (LTI) measurement channel with unknown system dynamics, driven by a single generating respiratory signal. Estimation of RP is cast as a blind deconvolution problem and is solved through a method comprising subspace projection and statistical aggregation. Experiments are carried out on 31 healthy human subjects by generating multiple RPs and comparing the proposed estimates with simultaneously acquired ground truth from an impedance pneumograph device. The proposed estimator agrees well with the ground truth device in terms of correlation measures, despite variability in clothing pattern, angle of view and ROI.