Abstract:Street scene datasets, collected from Street View or dashboard cameras, offer a promising means of detecting urban objects and incidents like street flooding. However, a major challenge in using these datasets is their lack of reliable labels: there are myriad types of incidents, many types occur rarely, and ground-truth measures of where incidents occur are lacking. Here, we propose BayFlood, a two-stage approach which circumvents this difficulty. First, we perform zero-shot classification of where incidents occur using a pretrained vision-language model (VLM). Second, we fit a spatial Bayesian model on the VLM classifications. The zero-shot approach avoids the need to annotate large training sets, and the Bayesian model provides frequent desiderata in urban settings - principled measures of uncertainty, smoothing across locations, and incorporation of external data like stormwater accumulation zones. We comprehensively validate this two-stage approach, showing that VLMs provide strong zero-shot signal for floods across multiple cities and time periods, the Bayesian model improves out-of-sample prediction relative to baseline methods, and our inferred flood risk correlates with known external predictors of risk. Having validated our approach, we show it can be used to improve urban flood detection: our analysis reveals 113,738 people who are at high risk of flooding overlooked by current methods, identifies demographic biases in existing methods, and suggests locations for new flood sensors. More broadly, our results showcase how Bayesian modeling of zero-shot LM annotations represents a promising paradigm because it avoids the need to collect large labeled datasets and leverages the power of foundation models while providing the expressiveness and uncertainty quantification of Bayesian models.
Abstract:Internet-scaled datasets are a luxury for human-robot interaction (HRI) researchers, as collecting natural interaction data in the wild is time-consuming and logistically challenging. The problem is exacerbated by robots' different form factors and interaction modalities. Inspired by recent work on ethnomethodological and conversation analysis (EMCA) in the domain of HRI, we propose ReStory, a method that has the potential to augment existing in-the-wild human-robot interaction datasets leveraging Vision Language Models. While still requiring human supervision, ReStory is capable of synthesizing human-interpretable interaction scenarios in the form of storyboards. We hope our proposed approach provides HRI researchers and interaction designers with a new angle to utilizing their valuable and scarce data.
Abstract:Despite the recent advancements in robotics and machine learning (ML), the deployment of autonomous robots in our everyday lives is still an open challenge. This is due to multiple reasons among which are their frequent mistakes, such as interrupting people or having delayed responses, as well as their limited ability to understand human speech, i.e., failure in tasks like transcribing speech to text. These mistakes may disrupt interactions and negatively influence human perception of these robots. To address this problem, robots need to have the ability to detect human-robot interaction (HRI) failures. The ERR@HRI 2024 challenge tackles this by offering a benchmark multimodal dataset of robot failures during human-robot interactions (HRI), encouraging researchers to develop and benchmark multimodal machine learning models to detect these failures. We created a dataset featuring multimodal non-verbal interaction data, including facial, speech, and pose features from video clips of interactions with a robotic coach, annotated with labels indicating the presence or absence of robot mistakes, user awkwardness, and interaction ruptures, allowing for the training and evaluation of predictive models. Challenge participants have been invited to submit their multimodal ML models for detection of robot errors and to be evaluated against various performance metrics such as accuracy, precision, recall, F1 score, with and without a margin of error reflecting the time-sensitivity of these metrics. The results of this challenge will help the research field in better understanding the robot failures in human-robot interactions and designing autonomous robots that can mitigate their own errors after successfully detecting them.
Abstract:Human-robot interaction requires to be studied in the wild. In the summers of 2022 and 2023, we deployed two trash barrel service robots through the wizard-of-oz protocol in public spaces to study human-robot interactions in urban settings. We deployed the robots at two different public plazas in downtown Manhattan and Brooklyn for a collective of 20 hours of field time. To date, relatively few long-term human-robot interaction studies have been conducted in shared public spaces. To support researchers aiming to fill this gap, we would like to share some of our insights and learned lessons that would benefit both researchers and practitioners on how to deploy robots in public spaces. We share best practices and lessons learned with the HRI research community to encourage more in-the-wild research of robots in public spaces and call for the community to share their lessons learned to a GitHub repository.
Abstract:This paper introduces our dataset featuring human-robot interactions (HRI) in urban public environments. This dataset is rich with social signals that we believe can be modeled to help understand naturalistic human-robot interaction. Our dataset currently comprises approximately 15 hours of video footage recorded from the robots' perspectives, within which we annotated a total of 274 observable interactions featuring a wide range of naturalistic human-robot interactions. The data was collected by two mobile trash barrel robots deployed in Astor Place, New York City, over the course of a week. We invite the HRI community to access and utilize our dataset. To the best of our knowledge, this is the first dataset showcasing robot deployments in a complete public, non-controlled setting involving urban residents.
Abstract:Machine learning models are commonly tested in-distribution (same dataset); performance almost always drops in out-of-distribution settings. For HRI research, the goal is often to develop generalized models. This makes domain generalization - retaining performance in different settings - a critical issue. In this study, we present a concise analysis of domain generalization in failure detection models trained on human facial expressions. Using two distinct datasets of humans reacting to videos where error occurs, one from a controlled lab setting and another collected online, we trained deep learning models on each dataset. When testing these models on the alternate dataset, we observed a significant performance drop. We reflect on the causes for the observed model behavior and leave recommendations. This work emphasizes the need for HRI research focusing on improving model robustness and real-life applicability.
Abstract:Scaffolds, also called sidewalk sheds, are intended to be temporary structures to protect pedestrians from construction and repair hazards. However, some sidewalk sheds are left up for years. Long-term scaffolding becomes eyesores, creates accessibility issues on sidewalks, and gives cover to illicit activity. Today, there are over 8,000 active permits for scaffolds in NYC; the more problematic scaffolds are likely expired or unpermitted. This research uses computer vision on street-level imagery to develop a longitudinal map of scaffolding throughout the city. Using a dataset of 29,156,833 dashcam images taken between August 2023 and January 2024, we develop an algorithm to track the presence of scaffolding over time. We also design and implement methods to match detected scaffolds to reported locations of active scaffolding permits, enabling the identification of sidewalk sheds without corresponding permits. We identify 850,766 images of scaffolding, tagging 5,156 active sidewalk sheds and estimating 529 unpermitted sheds. We discuss the implications of an in-the-wild scaffolding classifier for urban tech, innovations to governmental inspection processes, and out-of-distribution evaluations outside of New York City.
Abstract:This research explores whether the interaction between adversarial robots and creative practitioners can push artists to rethink their initial ideas. It also explores how working with these robots may influence artists' views of machines designed for creative tasks or collaboration. Many existing robots developed for creativity and the arts focus on complementing creative practices, but what if robots challenged ideas instead? To begin investigating this, I designed UnsTable, a robot drawing desk that moves the paper while participants (N=19) draw to interfere with the process. This inquiry invites further research into adversarial robots designed to challenge creative practitioners.
Abstract:Clothing for robots can help expand a robot's functionality and also clarify the robot's purpose to bystanders. In studying how to design clothing for robots, we can shed light on the functional role of aesthetics in interactive system design. We present a case study of designing a utility belt for an agricultural robot. We use reflection-in-action to consider the ways that observation, in situ making, and documentation serve to illuminate how pragmatic, aesthetic, and intellectual inquiry are layered in this applied design research project. Themes explored in this pictorial include 1) contextual discovery of materials, tools, and practices, 2) design space exploration of materials in context, 3) improvising spaces for making, and 4) social processes in design. These themes emerged from the qualitative coding of 25 reflection-in-action videos from the researcher. We conclude with feedback on the utility belt prototypes for an agriculture robot and our learnings about context, materials, and people needed to design successful novel clothing forms for robots.
Abstract:In this demonstration, we exhibit the initial results of an ongoing body of exploratory work, investigating the potential for creative machines to communicate and collaborate with people through movement as a form of implicit interaction. The paper describes a Wizard-of-Oz demo, where a hidden wizard controls an AxiDraw drawing robot while a participant collaborates with it to draw a custom postcard. This demonstration aims to gather perspectives from the computational fabrication community regarding how practitioners of fabrication with machines experience interacting with a mixed-initiative collaborative machine.