Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Isaac Sheidlower

How Users Understand Robot Foundation Model Performance through Task Success Rates and Beyond

Feb 03, 2026

Isaac Sheidlower, Jindan Huang, James Staley, Bingyu Wu, Qicong Chen, Reuben Aronson, Elaine Short

Abstract:Robot Foundation Models (RFMs) represent a promising approach to developing general-purpose home robots. Given the broad capabilities of RFMs, users will inevitably ask an RFM-based robot to perform tasks that the RFM was not trained or evaluated on. In these cases, it is crucial that users understand the risks associated with attempting novel tasks due to the relatively high cost of failure. Furthermore, an informed user who understands an RFM's capabilities will know what situations and tasks the robot can handle. In this paper, we study how non-roboticists interpret performance information from RFM evaluations. These evaluations typically report task success rate (TSR) as the primary performance metric. While TSR is intuitive to experts, it is necessary to validate whether novices also use this information as intended. Toward this end, we conducted a study in which users saw real evaluation data, including TSR, failure case descriptions, and videos from multiple published RFM research projects. The results highlight that non-experts not only use TSR in a manner consistent with expert expectations but also highly value other information types, such as failure cases that are not often reported in RFM evaluations. Furthermore, we find that users want access to both real data from previous evaluations of the RFM and estimates from the robot about how well it will do on a novel task.

* Submitted to IJCAI 2026

Via

Access Paper or Ask Questions

On the Effect of Robot Errors on Human Teaching Dynamics

Sep 15, 2024

Jindan Huang, Isaac Sheidlower, Reuben M. Aronson, Elaine Schaertl Short

Abstract:Human-in-the-loop learning is gaining popularity, particularly in the field of robotics, because it leverages human knowledge about real-world tasks to facilitate agent learning. When people instruct robots, they naturally adapt their teaching behavior in response to changes in robot performance. While current research predominantly focuses on integrating human teaching dynamics from an algorithmic perspective, understanding these dynamics from a human-centered standpoint is an under-explored, yet fundamental problem. Addressing this issue will enhance both robot learning and user experience. Therefore, this paper explores one potential factor contributing to the dynamic nature of human teaching: robot errors. We conducted a user study to investigate how the presence and severity of robot errors affect three dimensions of human teaching dynamics: feedback granularity, feedback richness, and teaching time, in both forced-choice and open-ended teaching contexts. The results show that people tend to spend more time teaching robots with errors, provide more detailed feedback over specific segments of a robot's trajectory, and that robot error can influence a teacher's choice of feedback modality. Our findings offer valuable insights for designing effective interfaces for interactive learning and optimizing algorithms to better understand human intentions.

* Accepted to 2024 International Conference on Human-Agent Interaction (HAI)

Via

Access Paper or Ask Questions

Towards Interpretable Foundation Models of Robot Behavior: A Task Specific Policy Generation Approach

Jul 10, 2024

Isaac Sheidlower, Reuben Aronson, Elaine Schaertl Short

Abstract:Foundation models are a promising path toward general-purpose and user-friendly robots. The prevalent approach involves training a generalist policy that, like a reinforcement learning policy, uses observations to output actions. Although this approach has seen much success, several concerns arise when considering deployment and end-user interaction with these systems. In particular, the lack of modularity between tasks means that when model weights are updated (e.g., when a user provides feedback), the behavior in other, unrelated tasks may be affected. This can negatively impact the system's interpretability and usability. We present an alternative approach to the design of robot foundation models, Diffusion for Policy Parameters (DPP), which generates stand-alone, task-specific policies. Since these policies are detached from the foundation model, they are updated only when a user wants, either through feedback or personalization, allowing them to gain a high degree of familiarity with that policy. We demonstrate a proof-of-concept of DPP in simulation then discuss its limitations and the future of interpretable foundation models.

* Short Paper accepted to RLC 2024 Workshop on Training Agents with Foundation Models

Via

Access Paper or Ask Questions

Imagining In-distribution States: How Predictable Robot Behavior Can Enable User Control Over Learned Policies

Jun 19, 2024

Isaac Sheidlower, Emma Bethel, Douglas Lilly, Reuben M. Aronson, Elaine Schaertl Short

Figure 1 for Imagining In-distribution States: How Predictable Robot Behavior Can Enable User Control Over Learned Policies

Figure 2 for Imagining In-distribution States: How Predictable Robot Behavior Can Enable User Control Over Learned Policies

Figure 3 for Imagining In-distribution States: How Predictable Robot Behavior Can Enable User Control Over Learned Policies

Figure 4 for Imagining In-distribution States: How Predictable Robot Behavior Can Enable User Control Over Learned Policies

Abstract:It is crucial that users are empowered to take advantage of the functionality of a robot and use their understanding of that functionality to perform novel and creative tasks. Given a robot trained with Reinforcement Learning (RL), a user may wish to leverage that autonomy along with their familiarity of how they expect the robot to behave to collaborate with the robot. One technique is for the user to take control of some of the robot's action space through teleoperation, allowing the RL policy to simultaneously control the rest. We formalize this type of shared control as Partitioned Control (PC). However, this may not be possible using an out-of-the-box RL policy. For example, a user's control may bring the robot into a failure state from the policy's perspective, causing it to act unexpectedly and hindering the success of the user's desired task. In this work, we formalize this problem and present Imaginary Out-of-Distribution Actions, IODA, an initial algorithm which empowers users to leverage their expectations of a robot's behavior to accomplish new tasks. We deploy IODA in a user study with a real robot and find that IODA leads to both better task performance and a higher degree of alignment between robot behavior and user expectation. We also show that in PC, there is a strong and significant correlation between task performance and the robot's ability to meet user expectations, highlighting the need for approaches like IODA. Code is available at https://github.com/AABL-Lab/ioda_roman_2024

* Accepted to IEEE RO-MAN 2024 as a regular paper. arXiv admin note: substantial text overlap with arXiv:2312.05991

Via

Access Paper or Ask Questions

Modifying RL Policies with Imagined Actions: How Predictable Policies Can Enable Users to Perform Novel Tasks

Dec 10, 2023

Isaac Sheidlower, Reuben Aronson, Elaine Short

Figure 1 for Modifying RL Policies with Imagined Actions: How Predictable Policies Can Enable Users to Perform Novel Tasks

Abstract:It is crucial that users are empowered to use the functionalities of a robot to creatively solve problems on the fly. A user who has access to a Reinforcement Learning (RL) based robot may want to use the robot's autonomy and their knowledge of its behavior to complete new tasks. One way is for the user to take control of some of the robot's action space through teleoperation while the RL policy simultaneously controls the rest. However, an out-of-the-box RL policy may not readily facilitate this. For example, a user's control may bring the robot into a failure state from the policy's perspective, causing it to act in a way the user is not familiar with, hindering the success of the user's desired task. In this work, we formalize this problem and present Imaginary Out-of-Distribution Actions, IODA, an initial algorithm for addressing that problem and empowering user's to leverage their expectation of a robot's behavior to accomplish new tasks.

* Pre-print to be published in the AAAI Fall Symposium 2023 Proceedings (part of the AI-HRI Symposium)

Via

Access Paper or Ask Questions