Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daniel Szafir

Practice Makes Perfect: A Study of Digital Twin Technology for Assembly and Problem-solving using Lunar Surface Telerobotics

May 19, 2025

Xavier O'Keefe, Katy McCutchan, Alexis Muniz, Jack Burns, Daniel Szafir

Abstract:Robotic systems that can traverse planetary or lunar surfaces to collect environmental data and perform physical manipulation tasks, such as assembling equipment or conducting mining operations, are envisioned to form the backbone of future human activities in space. However, the environmental conditions in which these robots, or "rovers," operate present challenges toward achieving fully autonomous solutions, meaning that rover missions will require some degree of human teleoperation or supervision for the foreseeable future. As a result, human operators require training to successfully direct rovers and avoid costly errors or mission failures, as well as the ability to recover from any issues that arise on the fly during mission activities. While analog environments, such as JPL's Mars Yard, can help with such training by simulating surface environments in the real world, access to such resources may be rare and expensive. As an alternative or supplement to such physical analogs, we explore the design and evaluation of a virtual reality digital twin system to train human teleoperation of robotic rovers with mechanical arms for space mission activities. We conducted an experiment with 24 human operators to investigate how our digital twin system can support human teleoperation of rovers in both pre-mission training and in real-time problem solving in a mock lunar mission in which users directed a physical rover in the context of deploying dipole radio antennas. We found that operators who first trained with the digital twin showed a 28% decrease in mission completion time, an 85% decrease in unrecoverable errors, as well as improved mental markers, including decreased cognitive load and increased situation awareness.

Via

Access Paper or Ask Questions

ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis

Mar 15, 2025

Yu Fang, Yue Yang, Xinghao Zhu, Kaiyuan Zheng, Gedas Bertasius, Daniel Szafir, Mingyu Ding

Abstract:Vision-language-action (VLA) models present a promising paradigm by training policies directly on real robot datasets like Open X-Embodiment. However, the high cost of real-world data collection hinders further data scaling, thereby restricting the generalizability of VLAs. In this paper, we introduce ReBot, a novel real-to-sim-to-real approach for scaling real robot datasets and adapting VLA models to target domains, which is the last-mile deployment challenge in robot manipulation. Specifically, ReBot replays real-world robot trajectories in simulation to diversify manipulated objects (real-to-sim), and integrates the simulated movements with inpainted real-world background to synthesize physically realistic and temporally consistent robot videos (sim-to-real). Our approach has several advantages: 1) it enjoys the benefit of real data to minimize the sim-to-real gap; 2) it leverages the scalability of simulation; and 3) it can generalize a pretrained VLA to a target domain with fully automated data pipelines. Extensive experiments in both simulation and real-world environments show that ReBot significantly enhances the performance and robustness of VLAs. For example, in SimplerEnv with the WidowX robot, ReBot improved the in-domain performance of Octo by 7.2% and OpenVLA by 21.8%, and out-of-domain generalization by 19.9% and 9.4%, respectively. For real-world evaluation with a Franka robot, ReBot increased the success rates of Octo by 17% and OpenVLA by 20%. More information can be found at: https://yuffish.github.io/rebot/

* Website: https://yuffish.github.io/rebot/

Via

Access Paper or Ask Questions

BOSS: Benchmark for Observation Space Shift in Long-Horizon Task

Feb 21, 2025

Yue Yang, Linfeng Zhao, Mingyu Ding, Gedas Bertasius, Daniel Szafir

Abstract:Robotics has long sought to develop visual-servoing robots capable of completing previously unseen long-horizon tasks. Hierarchical approaches offer a pathway for achieving this goal by executing skill combinations arranged by a task planner, with each visuomotor skill pre-trained using a specific imitation learning (IL) algorithm. However, even in simple long-horizon tasks like skill chaining, hierarchical approaches often struggle due to a problem we identify as Observation Space Shift (OSS), where the sequential execution of preceding skills causes shifts in the observation space, disrupting the performance of subsequent individually trained skill policies. To validate OSS and evaluate its impact on long-horizon tasks, we introduce BOSS (a Benchmark for Observation Space Shift). BOSS comprises three distinct challenges: "Single Predicate Shift", "Accumulated Predicate Shift", and "Skill Chaining", each designed to assess a different aspect of OSS's negative effect. We evaluated several recent popular IL algorithms on BOSS, including three Behavioral Cloning methods and the Visual Language Action model OpenVLA. Even on the simplest challenge, we observed average performance drops of 67%, 35%, 34%, and 54%, respectively, when comparing skill performance with and without OSS. Additionally, we investigate a potential solution to OSS that scales up the training data for each skill with a larger and more visually diverse set of demonstrations, with our results showing it is not sufficient to resolve OSS. The project page is: https://boss-benchmark.github.io/

Via

Access Paper or Ask Questions

ARCADE: Scalable Demonstration Collection and Generation via Augmented Reality for Imitation Learning

Oct 21, 2024

Yue Yang, Bryce Ikeda, Gedas Bertasius, Daniel Szafir

Figure 1 for ARCADE: Scalable Demonstration Collection and Generation via Augmented Reality for Imitation Learning

Figure 2 for ARCADE: Scalable Demonstration Collection and Generation via Augmented Reality for Imitation Learning

Figure 3 for ARCADE: Scalable Demonstration Collection and Generation via Augmented Reality for Imitation Learning

Figure 4 for ARCADE: Scalable Demonstration Collection and Generation via Augmented Reality for Imitation Learning

Abstract:Robot Imitation Learning (IL) is a crucial technique in robot learning, where agents learn by mimicking human demonstrations. However, IL encounters scalability challenges stemming from both non-user-friendly demonstration collection methods and the extensive time required to amass a sufficient number of demonstrations for effective training. In response, we introduce the Augmented Reality for Collection and generAtion of DEmonstrations (ARCADE) framework, designed to scale up demonstration collection for robot manipulation tasks. Our framework combines two key capabilities: 1) it leverages AR to make demonstration collection as simple as users performing daily tasks using their hands, and 2) it enables the automatic generation of additional synthetic demonstrations from a single human-derived demonstration, significantly reducing user effort and time. We assess ARCADE's performance on a real Fetch robot across three robotics tasks: 3-Waypoints-Reach, Push, and Pick-And-Place. Using our framework, we were able to rapidly train a policy using vanilla Behavioral Cloning (BC), a classic IL algorithm, which excelled across these three tasks. We also deploy ARCADE on a real household task, Pouring-Water, achieving an 80% success rate.

Via

Access Paper or Ask Questions

Augmented Reality Demonstrations for Scalable Robot Imitation Learning

Mar 20, 2024

Yue Yang, Bryce Ikeda, Gedas Bertasius, Daniel Szafir

Abstract:Robot Imitation Learning (IL) is a widely used method for training robots to perform manipulation tasks that involve mimicking human demonstrations to acquire skills. However, its practicality has been limited due to its requirement that users be trained in operating real robot arms to provide demonstrations. This paper presents an innovative solution: an Augmented Reality (AR)-assisted framework for demonstration collection, empowering non-roboticist users to produce demonstrations for robot IL using devices like the HoloLens 2. Our framework facilitates scalable and diverse demonstration collection for real-world tasks. We validate our approach with experiments on three classical robotics tasks: reach, push, and pick-and-place. The real robot performs each task successfully while replaying demonstrations collected via AR.

Via

Access Paper or Ask Questions

Paper index: Designing an introductory HRI course (workshop at HRI 2024)

Mar 04, 2024

Henny Admoni, Daniel Szafir, Wafa Johal, Anara Sandygulova

Abstract:Human-robot interaction is now an established discipline. Dozens of HRI courses exist at universities worldwide, and some institutions even offer degrees in HRI. However, although many students are being taught HRI, there is no agreed-upon curriculum for an introductory HRI course. In this workshop, we aimed to reach community consensus on what should be covered in such a course. Through interactive activities like panels, breakout discussions, and syllabus design, workshop participants explored the many topics and pedagogical approaches for teaching HRI. This collection of articles submitted to the workshop provides examples of HRI courses being offered worldwide.

Via

Access Paper or Ask Questions

TOBY: A Tool for Exploring Data in Academic Survey Papers

Jun 13, 2023

Tathagata Chakraborti, Jungkoo Kang, Christian Muise, Sarath Sreedharan, Michael Walker, Daniel Szafir, Tom Williams

Abstract:This paper describes TOBY, a visualization tool that helps a user explore the contents of an academic survey paper. The visualization consists of four components: a hierarchical view of taxonomic data in the survey, a document similarity view in the space of taxonomic classes, a network view of citations, and a new paper recommendation tool. In this paper, we will discuss these features in the context of three separate deployments of the tool.

Via

Access Paper or Ask Questions

Dynamic Competency Self-Assessment for Autonomous Agents

Mar 03, 2023

Nicholas Conlon, Nisar R. Ahmed, Daniel Szafir

Abstract:As autonomous robots are deployed in increasingly complex environments, platform degradation, environmental uncertainties, and deviations from validated operation conditions can make it difficult for human partners to understand robot capabilities and limitations. The ability for a robot to self-assess its competency in dynamic and uncertain environments will be a crucial next step in successful human-robot teaming. This work presents and evaluates an Event-Triggered Generalized Outcome Assessment (ET-GOA) algorithm for autonomous agents to dynamically assess task confidence during execution. The algorithm uses a fast online statistical test of the agent's observations and its model predictions to decide when competency assessment is needed. We provide experimental results using ET-GOA to generate competency reports during a simulated delivery task and suggest future research directions for self-assessing agents.

* Accepted to the HRI 2023 workshop on Variable Autonomy for human-robot Teaming (VAT-HRI)

Via

Access Paper or Ask Questions

Generating Executable Action Plans with Environmentally-Aware Language Models

Oct 10, 2022

Maitrey Gramopadhye, Daniel Szafir

Figure 1 for Generating Executable Action Plans with Environmentally-Aware Language Models

Figure 2 for Generating Executable Action Plans with Environmentally-Aware Language Models

Figure 3 for Generating Executable Action Plans with Environmentally-Aware Language Models

Abstract:Large Language Models (LLMs) trained using massive text datasets have recently shown promise in generating action plans for robotic agents from high level text queries. However, these models typically do not consider the robot's environment, resulting in generated plans that may not actually be executable due to ambiguities in the planned actions or environmental constraints. In this paper, we propose an approach to generate environmentally-aware action plans that can be directly mapped to executable agent actions. Our approach involves integrating environmental objects and object relations as additional inputs into LLM action plan generation to provide the system with an awareness of its surroundings, resulting in plans where each generated action is mapped to objects present in the scene. We also design a novel scoring function that, along with generating the action steps and associating them with objects, helps the system disambiguate among object instances and take into account their states. We evaluate our approach using the VirtualHome simulator and the ActivityPrograms knowledge base. Our results show that the action plans generated from our system outperform prior work in terms of their correctness and executability by 5.3% and 8.9% respectively.

Via

Access Paper or Ask Questions

Investigating the Effects of Robot Proficiency Self-Assessment on Trust and Performance

Mar 19, 2022

Nicholas Conlon, Daniel Szafir, Nisar Ahmed

Figure 1 for Investigating the Effects of Robot Proficiency Self-Assessment on Trust and Performance

Figure 2 for Investigating the Effects of Robot Proficiency Self-Assessment on Trust and Performance

Figure 3 for Investigating the Effects of Robot Proficiency Self-Assessment on Trust and Performance

Figure 4 for Investigating the Effects of Robot Proficiency Self-Assessment on Trust and Performance

Abstract:Human-robot teams will soon be expected to accomplish complex tasks in high-risk and uncertain environments. Here, the human may not necessarily be a robotics expert, but will need to establish a baseline understanding of the robot's abilities in order to appropriately utilize and rely on the robot. This willingness to rely, also known as trust, is based partly on the human's belief in the robot's proficiency at a given task. If trust is too high, the human may push the robot beyond its capabilities. If trust is too low, the human may not utilize it when they otherwise could have, wasting precious resources. In this work, we develop and execute an online human-subjects study to investigate how robot proficiency self-assessment reports based on Factorized Machine Self-Confidence affect operator trust and task performance in a grid world navigation task. Additionally we present and analyze a metric for trust level assessment, which measures the allocation of control between an operator and robot when the human teammate is free to switch between teleportation and autonomous control. Our results show that an a priori robot self-assessment report aligns operator trust with robot proficiency, and leads to performance improvements and small increases in self-reported trust.

* Appears in Proceedings of the AAAI SSS-22 Symposium "Closing the Assessment Loop: Communicating Proficiency and Intent in Human-Robot Teaming"

Via

Access Paper or Ask Questions