Abstract:Despite the recent advancements in robotics and machine learning (ML), the deployment of autonomous robots in our everyday lives is still an open challenge. This is due to multiple reasons among which are their frequent mistakes, such as interrupting people or having delayed responses, as well as their limited ability to understand human speech, i.e., failure in tasks like transcribing speech to text. These mistakes may disrupt interactions and negatively influence human perception of these robots. To address this problem, robots need to have the ability to detect human-robot interaction (HRI) failures. The ERR@HRI 2024 challenge tackles this by offering a benchmark multimodal dataset of robot failures during human-robot interactions (HRI), encouraging researchers to develop and benchmark multimodal machine learning models to detect these failures. We created a dataset featuring multimodal non-verbal interaction data, including facial, speech, and pose features from video clips of interactions with a robotic coach, annotated with labels indicating the presence or absence of robot mistakes, user awkwardness, and interaction ruptures, allowing for the training and evaluation of predictive models. Challenge participants have been invited to submit their multimodal ML models for detection of robot errors and to be evaluated against various performance metrics such as accuracy, precision, recall, F1 score, with and without a margin of error reflecting the time-sensitivity of these metrics. The results of this challenge will help the research field in better understanding the robot failures in human-robot interactions and designing autonomous robots that can mitigate their own errors after successfully detecting them.
Abstract:Human-robot interaction requires to be studied in the wild. In the summers of 2022 and 2023, we deployed two trash barrel service robots through the wizard-of-oz protocol in public spaces to study human-robot interactions in urban settings. We deployed the robots at two different public plazas in downtown Manhattan and Brooklyn for a collective of 20 hours of field time. To date, relatively few long-term human-robot interaction studies have been conducted in shared public spaces. To support researchers aiming to fill this gap, we would like to share some of our insights and learned lessons that would benefit both researchers and practitioners on how to deploy robots in public spaces. We share best practices and lessons learned with the HRI research community to encourage more in-the-wild research of robots in public spaces and call for the community to share their lessons learned to a GitHub repository.
Abstract:This paper introduces our dataset featuring human-robot interactions (HRI) in urban public environments. This dataset is rich with social signals that we believe can be modeled to help understand naturalistic human-robot interaction. Our dataset currently comprises approximately 15 hours of video footage recorded from the robots' perspectives, within which we annotated a total of 274 observable interactions featuring a wide range of naturalistic human-robot interactions. The data was collected by two mobile trash barrel robots deployed in Astor Place, New York City, over the course of a week. We invite the HRI community to access and utilize our dataset. To the best of our knowledge, this is the first dataset showcasing robot deployments in a complete public, non-controlled setting involving urban residents.
Abstract:Machine learning models are commonly tested in-distribution (same dataset); performance almost always drops in out-of-distribution settings. For HRI research, the goal is often to develop generalized models. This makes domain generalization - retaining performance in different settings - a critical issue. In this study, we present a concise analysis of domain generalization in failure detection models trained on human facial expressions. Using two distinct datasets of humans reacting to videos where error occurs, one from a controlled lab setting and another collected online, we trained deep learning models on each dataset. When testing these models on the alternate dataset, we observed a significant performance drop. We reflect on the causes for the observed model behavior and leave recommendations. This work emphasizes the need for HRI research focusing on improving model robustness and real-life applicability.
Abstract:Scaffolds, also called sidewalk sheds, are intended to be temporary structures to protect pedestrians from construction and repair hazards. However, some sidewalk sheds are left up for years. Long-term scaffolding becomes eyesores, creates accessibility issues on sidewalks, and gives cover to illicit activity. Today, there are over 8,000 active permits for scaffolds in NYC; the more problematic scaffolds are likely expired or unpermitted. This research uses computer vision on street-level imagery to develop a longitudinal map of scaffolding throughout the city. Using a dataset of 29,156,833 dashcam images taken between August 2023 and January 2024, we develop an algorithm to track the presence of scaffolding over time. We also design and implement methods to match detected scaffolds to reported locations of active scaffolding permits, enabling the identification of sidewalk sheds without corresponding permits. We identify 850,766 images of scaffolding, tagging 5,156 active sidewalk sheds and estimating 529 unpermitted sheds. We discuss the implications of an in-the-wild scaffolding classifier for urban tech, innovations to governmental inspection processes, and out-of-distribution evaluations outside of New York City.
Abstract:This research explores whether the interaction between adversarial robots and creative practitioners can push artists to rethink their initial ideas. It also explores how working with these robots may influence artists' views of machines designed for creative tasks or collaboration. Many existing robots developed for creativity and the arts focus on complementing creative practices, but what if robots challenged ideas instead? To begin investigating this, I designed UnsTable, a robot drawing desk that moves the paper while participants (N=19) draw to interfere with the process. This inquiry invites further research into adversarial robots designed to challenge creative practitioners.
Abstract:Clothing for robots can help expand a robot's functionality and also clarify the robot's purpose to bystanders. In studying how to design clothing for robots, we can shed light on the functional role of aesthetics in interactive system design. We present a case study of designing a utility belt for an agricultural robot. We use reflection-in-action to consider the ways that observation, in situ making, and documentation serve to illuminate how pragmatic, aesthetic, and intellectual inquiry are layered in this applied design research project. Themes explored in this pictorial include 1) contextual discovery of materials, tools, and practices, 2) design space exploration of materials in context, 3) improvising spaces for making, and 4) social processes in design. These themes emerged from the qualitative coding of 25 reflection-in-action videos from the researcher. We conclude with feedback on the utility belt prototypes for an agriculture robot and our learnings about context, materials, and people needed to design successful novel clothing forms for robots.
Abstract:In this demonstration, we exhibit the initial results of an ongoing body of exploratory work, investigating the potential for creative machines to communicate and collaborate with people through movement as a form of implicit interaction. The paper describes a Wizard-of-Oz demo, where a hidden wizard controls an AxiDraw drawing robot while a participant collaborates with it to draw a custom postcard. This demonstration aims to gather perspectives from the computational fabrication community regarding how practitioners of fabrication with machines experience interacting with a mixed-initiative collaborative machine.
Abstract:For a robot to repair its own error, it must first know it has made a mistake. One way that people detect errors is from the implicit reactions from bystanders -- their confusion, smirks, or giggles clue us in that something unexpected occurred. To enable robots to detect and act on bystander responses to task failures, we developed a novel method to elicit bystander responses to human and robot errors. Using 46 different stimulus videos featuring a variety of human and machine task failures, we collected a total of 2452 webcam videos of human reactions from 54 participants. To test the viability of the collected data, we used the bystander reaction dataset as input to a deep-learning model, BADNet, to predict failure occurrence. We tested different data labeling methods and learned how they affect model performance, achieving precisions above 90%. We discuss strategies to model bystander reactions and predict failure and how this approach can be used in real-world robotic deployments to detect errors and improve robot performance. As part of this work, we also contribute with the "Bystander Affect Detection" (BAD) dataset of bystander reactions, supporting the development of better prediction models.
Abstract:Robots that carry out tasks and interact in complex environments will inevitably commit errors. Error detection is thus an important ability for robots to master, to work in an efficient and productive way. People leverage social cues from others around them to recognize and repair their own mistakes. With advances in computing and AI, it is increasingly possible for robots to achieve a similar error detection capability. In this work, we review current literature around the topic of how social cues can be used to recognize task failures for human-robot interaction (HRI). This literature review unites insights from behavioral science, human-robot interaction, and machine learning, to focus on three areas: 1) social cues for error detection (from behavioral science), 2) recognizing task failures in robots (from HRI), and 3) approaches for autonomous detection of HRI task failures based on social cues (from machine learning). We propose a taxonomy of error detection based on self-awareness and social feedback. Finally, we leave recommendations for HRI researchers and practitioners interested in developing robots that detect (physical) task errors using social cues from bystanders.