Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Andrew Hundt

LLM-Driven Robots Risk Enacting Discrimination, Violence, and Unlawful Actions

Jun 13, 2024

Rumaisa Azeem, Andrew Hundt, Masoumeh Mansouri, Martim Brandão

Abstract:Members of the Human-Robot Interaction (HRI) and Artificial Intelligence (AI) communities have proposed Large Language Models (LLMs) as a promising resource for robotics tasks such as natural language interactions, doing household and workplace tasks, approximating `common sense reasoning', and modeling humans. However, recent research has raised concerns about the potential for LLMs to produce discriminatory outcomes and unsafe behaviors in real-world robot experiments and applications. To address these concerns, we conduct an HRI-based evaluation of discrimination and safety criteria on several highly-rated LLMs. Our evaluation reveals that LLMs currently lack robustness when encountering people across a diverse range of protected identity characteristics (e.g., race, gender, disability status, nationality, religion, and their intersections), producing biased outputs consistent with directly discriminatory outcomes -- e.g. `gypsy' and `mute' people are labeled untrustworthy, but not `european' or `able-bodied' people. Furthermore, we test models in settings with unconstrained natural language (open vocabulary) inputs, and find they fail to act safely, generating responses that accept dangerous, violent, or unlawful instructions -- such as incident-causing misstatements, taking people's mobility aids, and sexual predation. Our results underscore the urgent need for systematic, routine, and comprehensive risk assessments and assurances to improve outcomes and ensure LLMs only operate on robots when it is safe, effective, and just to do so. Data and code will be made available.

* 40 pages (52 with references), 21 Figures, 6 Tables

Via

Access Paper or Ask Questions

Love, Joy, and Autism Robots: A Metareview and Provocatype

Mar 08, 2024

Andrew Hundt, Gabrielle Ohlson, Pieter Wolfert, Lux Miranda, Sophia Zhu, Katie Winkle

Abstract:Previous work has observed how Neurodivergence is often harmfully pathologized in Human-Computer Interaction (HCI) and Human-Robot interaction (HRI) research. We conduct a review of autism robot reviews and find the dominant research direction is Autistic people's second to lowest (24 of 25) research priority: interventions and treatments purporting to 'help' neurodivergent individuals to conform to neurotypical social norms, become better behaved, improve social and emotional skills, and otherwise 'fix' us -- rarely prioritizing the internal experiences that might lead to such differences. Furthermore, a growing body of evidence indicates many of the most popular current approaches risk inflicting lasting trauma and damage on Autistic people. We draw on the principles and findings of the latest Autism research, Feminist HRI, and Robotics to imagine a role reversal, analyze the implications, then conclude with actionable guidance on Autistic-led scientific methods and research directions.

* 3 pages; In Assistive Applications, Accessibility, and Disability Ethics (A3DE) workshop at the Human Robot Interaction (HRI) Conference 2024; https://sites.google.com/view/love-joy-and-autism-robots/home

Via

Access Paper or Ask Questions

Towards Equitable Agile Research and Development of AI and Robotics

Feb 13, 2024

Andrew Hundt, Julia Schuller, Severin Kacianka

Figure 1 for Towards Equitable Agile Research and Development of AI and Robotics

Figure 2 for Towards Equitable Agile Research and Development of AI and Robotics

Abstract:Machine Learning (ML) and 'Artificial Intelligence' ('AI') methods tend to replicate and amplify existing biases and prejudices, as do Robots with AI. For example, robots with facial recognition have failed to identify Black Women as human, while others have categorized people, such as Black Men, as criminals based on appearance alone. A 'culture of modularity' means harms are perceived as 'out of scope', or someone else's responsibility, throughout employment positions in the 'AI supply chain'. Incidents are routine enough (incidentdatabase.ai lists over 2000 examples) to indicate that few organizations are capable of completely respecting peoples' rights; meeting claimed equity, diversity, and inclusion (EDI or DEI) goals; or recognizing and then addressing such failures in their organizations and artifacts. We propose a framework for adapting widely practiced Research and Development (R&D) project management methodologies to build organizational equity capabilities and better integrate known evidence-based best practices. We describe how project teams can organize and operationalize the most promising practices, skill sets, organizational cultures, and methods to detect and address rights-based fairness, equity, accountability, and ethical problems as early as possible when they are often less harmful and easier to mitigate; then monitor for unforeseen incidents to adaptively and constructively address them. Our primary example adapts an Agile development process based on Scrum, one of the most widely adopted approaches to organizing R&D teams. We also discuss limitations of our proposed framework and future research directions.

* 15 pages (32 with refs + appendix), 2 figures, 1 table (7 with appendix), incorporates changes based on WeRobot 2023 Draft feedback

Via

Access Paper or Ask Questions

SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation

Jan 16, 2024

Zhixuan Liu, Peter Schaldenbrand, Beverley-Claire Okogwu, Wenxuan Peng, Youngsik Yun, Andrew Hundt, Jihie Kim, Jean Oh

Figure 1 for SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation

Figure 2 for SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation

Figure 3 for SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation

Figure 4 for SCoFT: Self-Contrastive Fine-Tuning for Equitable Image Generation

Abstract:Accurate representation in media is known to improve the well-being of the people who consume it. Generative image models trained on large web-crawled datasets such as LAION are known to produce images with harmful stereotypes and misrepresentations of cultures. We improve inclusive representation in generated images by (1) engaging with communities to collect a culturally representative dataset that we call the Cross-Cultural Understanding Benchmark (CCUB) and (2) proposing a novel Self-Contrastive Fine-Tuning (SCoFT) method that leverages the model's known biases to self-improve. SCoFT is designed to prevent overfitting on small datasets, encode only high-level information from the data, and shift the generated distribution away from misrepresentations encoded in a pretrained model. Our user study conducted on 51 participants from 5 different countries based on their self-selected national cultural affiliation shows that fine-tuning on CCUB consistently generates images with higher cultural relevance and fewer stereotypes when compared to the Stable Diffusion baseline, which is further improved with our SCoFT technique.

Via

Access Paper or Ask Questions

Robots Enact Malignant Stereotypes

Jul 23, 2022

Andrew Hundt, William Agnew, Vicky Zeng, Severin Kacianka, Matthew Gombolay

Figure 1 for Robots Enact Malignant Stereotypes

Figure 2 for Robots Enact Malignant Stereotypes

Figure 3 for Robots Enact Malignant Stereotypes

Figure 4 for Robots Enact Malignant Stereotypes

Abstract:Stereotypes, bias, and discrimination have been extensively documented in Machine Learning (ML) methods such as Computer Vision (CV) [18, 80], Natural Language Processing (NLP) [6], or both, in the case of large image and caption models such as OpenAI CLIP [14]. In this paper, we evaluate how ML bias manifests in robots that physically and autonomously act within the world. We audit one of several recently published CLIP-powered robotic manipulation methods, presenting it with objects that have pictures of human faces on the surface which vary across race and gender, alongside task descriptions that contain terms associated with common stereotypes. Our experiments definitively show robots acting out toxic stereotypes with respect to gender, race, and scientifically-discredited physiognomy, at scale. Furthermore, the audited methods are less likely to recognize Women and People of Color. Our interdisciplinary sociotechnical analysis synthesizes across fields and applications such as Science Technology and Society (STS), Critical Studies, History, Safety, Robotics, and AI. We find that robots powered by large datasets and Dissolution Models (sometimes called "foundation models", e.g. CLIP) that contain humans risk physically amplifying malignant stereotypes in general; and that merely correcting disparities will be insufficient for the complexity and scale of the problem. Instead, we recommend that robot learning methods that physically manifest stereotypes or other harmful outcomes be paused, reworked, or even wound down when appropriate, until outcomes can be proven safe, effective, and just. Finally, we discuss comprehensive policy changes and the potential of new interdisciplinary research on topics like Identity Safety Assessment Frameworks and Design Justice to better understand and address these harms.

* In 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT 22). ACM, New York, NY, USA, 743-756
* 30 pages, 10 figures, 5 tables. Website: https://sites.google.com/view/robots-enact-stereotypes . Published in the 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT 22), June 21-24, 2022, Seoul, Republic of Korea. ACM, DOI: https://doi.org/10.1145/3531146.3533138 . FAccT22 Submission dates: Abstract Dec 13, 2021; Submitted Jan 22, 2022; Accepted Apr 7, 2022

Via

Access Paper or Ask Questions

"Good Robot!": Efficient Reinforcement Learning for Multi-Step Visual Tasks via Reward Shaping

Sep 25, 2019

Andrew Hundt, Benjamin Killeen, Heeyeon Kwon, Chris Paxton, Gregory D. Hager

Figure 1 for "Good Robot!": Efficient Reinforcement Learning for Multi-Step Visual Tasks via Reward Shaping

Figure 2 for "Good Robot!": Efficient Reinforcement Learning for Multi-Step Visual Tasks via Reward Shaping

Figure 3 for "Good Robot!": Efficient Reinforcement Learning for Multi-Step Visual Tasks via Reward Shaping

Figure 4 for "Good Robot!": Efficient Reinforcement Learning for Multi-Step Visual Tasks via Reward Shaping

Abstract:In order to learn effectively, robots must be able to extract the intangible context by which task progress and mistakes are defined. In the domain of reinforcement learning, much of this information is provided by the reward function. Hence, reward shaping is a necessary part of how we can achieve state-of-the-art results on complex, multi-step tasks. However, comparatively little work has examined how reward shaping should be done so that it captures task context, particularly in scenarios where the task is long-horizon and failure is highly consequential. Our Schedule for Positive Task (SPOT) reward trains our Efficient Visual Task (EVT) model to solve problems that require an understanding of both task context and workspace constraints of multi-step block arrangement tasks. In simulation EVT can completely clear adversarial arrangements of objects by pushing and grasping in 99% of cases vs an 82% baseline in prior work. For random arrangements EVT clears 100% of test cases at 86% action efficiency vs 61% efficiency in prior work. EVT + SPOT is also able to demonstrate context understanding and complete stacks in 74% of trials compared to a baseline of 5% with EVT alone. To our knowledge, this is the first instance of a Reinforcement Learning based algorithm successfully completing such a challenge. Code is available at https://github.com/jhu-lcsr/good_robot .

* 7 pages, 6 figures, code is available at https://github.com/jhu-lcsr/good_robot

Via

Access Paper or Ask Questions

sharpDARTS: Faster and More Accurate Differentiable Architecture Search

Mar 23, 2019

Andrew Hundt, Varun Jain, Gregory D. Hager

Figure 1 for sharpDARTS: Faster and More Accurate Differentiable Architecture Search

Figure 2 for sharpDARTS: Faster and More Accurate Differentiable Architecture Search

Figure 3 for sharpDARTS: Faster and More Accurate Differentiable Architecture Search

Figure 4 for sharpDARTS: Faster and More Accurate Differentiable Architecture Search

Abstract:Neural Architecture Search (NAS) has been a source of dramatic improvements in neural network design, with recent results meeting or exceeding the performance of hand-tuned architectures. However, our understanding of how to represent the search space for neural net architectures and how to search that space efficiently are both still in their infancy. We have performed an in-depth analysis to identify limitations in a widely used search space and a recent architecture search method, Differentiable Architecture Search (DARTS). These findings led us to introduce novel network blocks with a more general, balanced, and consistent design; a better-optimized Cosine Power Annealing learning rate schedule; and other improvements. Our resulting sharpDARTS search is 50% faster with a 20-30% relative improvement in final model error on CIFAR-10 when compared to DARTS. Our best single model run has 1.93% (1.98+/-0.07) validation error on CIFAR-10 and 5.5% error (5.8+/-0.3) on the recently released CIFAR-10.1 test set. To our knowledge, both are state of the art for models of similar size. This model also generalizes competitively to ImageNet at 25.1% top-1 (7.8% top-5) error. We found improvements for existing search spaces but does DARTS generalize to new domains? We propose Differentiable Hyperparameter Grid Search and the HyperCuboid search space, which are representations designed to leverage DARTS for more general parameter optimization. Here we find that DARTS fails to generalize when compared against a human's one shot choice of models. We look back to the DARTS and sharpDARTS search spaces to understand why, and an ablation study reveals an unusual generalization gap. We finally propose Max-W regularization to solve this problem, which proves significantly better than the handmade design. Code will be made available.

* 9 pages, 6 figures, 4 tables

Via

Access Paper or Ask Questions

The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints

Mar 12, 2019

Andrew Hundt, Varun Jain, Chia-Hung Lin, Chris Paxton, Gregory D. Hager

Figure 1 for The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints

Figure 2 for The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints

Figure 3 for The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints

Figure 4 for The CoSTAR Block Stacking Dataset: Learning with Workspace Constraints

Abstract:A robot can now grasp an object more effectively than ever before, but once it has the object what happens next? We show that a mild relaxation of the task and workspace constraints implicit in existing object grasping datasets can cause neural network based grasping algorithms to fail on even a simple block stacking task when executed under more realistic circumstances. To address this, we introduce the JHU CoSTAR Block Stacking Dataset (BSD), where a robot interacts with 5.1 cm colored blocks to complete an order-fulfillment style block stacking task. It contains dynamic scenes and real time-series data in a less constrained environment than comparable datasets. There are nearly 12,000 stacking attempts and over 2 million frames of real data. We discuss the ways in which this dataset provides a valuable resource for a broad range of other topics of investigation. We find that hand-designed neural networks that work on prior datasets do not generalize to this task. Thus, to establish a baseline for this dataset, we demonstrate an automated search of neural network based models using a novel multiple-input HyperTree MetaModel, and find a final model which makes reasonable 3D pose predictions for grasping and stacking on our dataset. The CoSTAR BSD, code, and instructions are available at https://sites.google.com/site/costardataset.

* This is a major revision refocusing the topic towards the JHU CoSTAR Block Stacking Dataset, workspace constraints, and a comparison of HyperTrees with hand-designed algorithms. 12 pages, 10 figures, and 3 tables

Via

Access Paper or Ask Questions

Evaluating Methods for End-User Creation of Robot Task Plans

Nov 06, 2018

Chris Paxton, Felix Jonathan, Andrew Hundt, Bilge Mutlu, Gregory D. Hager

Figure 1 for Evaluating Methods for End-User Creation of Robot Task Plans

Figure 2 for Evaluating Methods for End-User Creation of Robot Task Plans

Figure 3 for Evaluating Methods for End-User Creation of Robot Task Plans

Figure 4 for Evaluating Methods for End-User Creation of Robot Task Plans

Abstract:How can we enable users to create effective, perception-driven task plans for collaborative robots? We conducted a 35-person user study with the Behavior Tree-based CoSTAR system to determine which strategies for end user creation of generalizable robot task plans are most usable and effective. CoSTAR allows domain experts to author complex, perceptually grounded task plans for collaborative robots. As a part of CoSTAR's wide range of capabilities, it allows users to specify SmartMoves: abstract goals such as "pick up component A from the right side of the table." Users were asked to perform pick-and-place assembly tasks with either SmartMoves or one of three simpler baseline versions of CoSTAR. Overall, participants found CoSTAR to be highly usable, with an average System Usability Scale score of 73.4 out of 100. SmartMove also helped users perform tasks faster and more effectively; all SmartMove users completed the first two tasks, while not all users completed the tasks using the other strategies. SmartMove users showed better performance for incorporating perception across all three tasks.

* 2018 IEEE Conference on Intelligent Robots and Systems
* 7 pages; IROS 2018

Via

Access Paper or Ask Questions

User Experience of the CoSTAR System for Instruction of Collaborative Robots

Mar 23, 2017

Chris Paxton, Felix Jonathan, Andrew Hundt, Bilge Mutlu, Gregory D. Hager

Figure 1 for User Experience of the CoSTAR System for Instruction of Collaborative Robots

Figure 2 for User Experience of the CoSTAR System for Instruction of Collaborative Robots

Figure 3 for User Experience of the CoSTAR System for Instruction of Collaborative Robots

Figure 4 for User Experience of the CoSTAR System for Instruction of Collaborative Robots

Abstract:How can we enable novice users to create effective task plans for collaborative robots? Must there be a tradeoff between generalizability and ease of use? To answer these questions, we conducted a user study with the CoSTAR system, which integrates perception and reasoning into a Behavior Tree-based task plan editor. In our study, we ask novice users to perform simple pick-and-place assembly tasks under varying perception and planning capabilities. Our study shows that users found Behavior Trees to be an effective way of specifying task plans. Furthermore, users were also able to more quickly, effectively, and generally author task plans with the addition of CoSTAR's planning, perception, and reasoning capabilities. Despite these improvements, concepts associated with these capabilities were rated by users as less usable, and our results suggest a direction for further refinement.

* 8 pages, currently in peer review. Video: https://www.youtube.com/watch?v=uf_3P6TmVrQ

Via

Access Paper or Ask Questions