Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Sean Andrist

SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research

May 16, 2024

Dan Bohus, Sean Andrist, Nick Saw, Ann Paradiso, Ishani Chakraborty, Mahdi Rad

Figure 1 for SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research

Figure 2 for SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research

Figure 3 for SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research

Figure 4 for SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research

Abstract:We introduce an open-source system called SIGMA (short for "Situated Interactive Guidance, Monitoring, and Assistance") as a platform for conducting research on task-assistive agents in mixed-reality scenarios. The system leverages the sensing and rendering affordances of a head-mounted mixed-reality device in conjunction with large language and vision models to guide users step by step through procedural tasks. We present the system's core capabilities, discuss its overall design and implementation, and outline directions for future research enabled by the system. SIGMA is easily extensible and provides a useful basis for future research at the intersection of mixed reality and AI. By open-sourcing an end-to-end implementation, we aim to lower the barrier to entry, accelerate research in this space, and chart a path towards community-driven end-to-end evaluation of large language, vision, and multimodal models in the context of real-world interactive applications.

* 10 pages, 5 figures

Via

Access Paper or Ask Questions

HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World

Sep 29, 2023

Xin Wang, Taein Kwon, Mahdi Rad, Bowen Pan, Ishani Chakraborty, Sean Andrist, Dan Bohus, Ashley Feniello, Bugra Tekin, Felipe Vieira Frujeri(+2 more)

Figure 1 for HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World

Figure 2 for HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World

Figure 3 for HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World

Figure 4 for HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World

Abstract:Building an interactive AI assistant that can perceive, reason, and collaborate with humans in the real world has been a long-standing pursuit in the AI community. This work is part of a broader research effort to develop intelligent agents that can interactively guide humans through performing tasks in the physical world. As a first step in this direction, we introduce HoloAssist, a large-scale egocentric human interaction dataset, where two people collaboratively complete physical manipulation tasks. The task performer executes the task while wearing a mixed-reality headset that captures seven synchronized data streams. The task instructor watches the performer's egocentric video in real time and guides them verbally. By augmenting the data with action and conversational annotations and observing the rich behaviors of various participants, we present key insights into how human assistants correct mistakes, intervene in the task completion procedure, and ground their instructions to the environment. HoloAssist spans 166 hours of data captured by 350 unique instructor-performer pairs. Furthermore, we construct and present benchmarks on mistake detection, intervention type prediction, and hand forecasting, along with detailed analysis. We expect HoloAssist will provide an important resource for building AI assistants that can fluidly collaborate with humans in the real world. Data can be downloaded at https://holoassist.github.io/.

* ICCV 2023

Via

Access Paper or Ask Questions

Platform for Situated Intelligence

Mar 29, 2021

Dan Bohus, Sean Andrist, Ashley Feniello, Nick Saw, Mihai Jalobeanu, Patrick Sweeney, Anne Loomis Thompson, Eric Horvitz

Figure 1 for Platform for Situated Intelligence

Figure 2 for Platform for Situated Intelligence

Figure 3 for Platform for Situated Intelligence

Figure 4 for Platform for Situated Intelligence

Abstract:We introduce Platform for Situated Intelligence, an open-source framework created to support the rapid development and study of multimodal, integrative-AI systems. The framework provides infrastructure for sensing, fusing, and making inferences from temporal streams of data across different modalities, a set of tools that enable visualization and debugging, and an ecosystem of components that encapsulate a variety of perception and processing technologies. These assets jointly provide the means for rapidly constructing and refining multimodal, integrative-AI systems, while retaining the efficiency and performance characteristics required for deployment in open-world settings.

* 29 pages, 14 figures, Microsoft Research Technical Report

Via

Access Paper or Ask Questions

Accelerating the Development of Multimodal, Integrative-AI Systems with Platform for Situated Intelligence

Oct 12, 2020

Sean Andrist, Dan Bohus

Figure 1 for Accelerating the Development of Multimodal, Integrative-AI Systems with Platform for Situated Intelligence

Abstract:We describe Platform for Situated Intelligence, an open-source framework for multimodal, integrative-AI systems. The framework provides infrastructure, tools, and components that enable and accelerate the development of applications that process multimodal streams of data and in which timing is critical. The framework is particularly well-suited for developing physically situated interactive systems that perceive and reason about their surroundings in order to better interact with people, such as social robots, virtual assistants, smart meeting rooms, etc. In this paper, we provide a brief, high-level overview of the framework and its main affordances, and discuss its implications for HRI.

* 5 pages, 1 figure. Submitted to the 2020 AAAI Fall Symposium: Trust and Explainability in Artificial Intelligence for Human-Robot Interaction

Via

Access Paper or Ask Questions

REFORM: Recognizing F-formations for Social Robots

Aug 17, 2020

Hooman Hedayati, Annika Muehlbradt, Daniel J. Szafir, Sean Andrist

Figure 1 for REFORM: Recognizing F-formations for Social Robots

Figure 2 for REFORM: Recognizing F-formations for Social Robots

Figure 3 for REFORM: Recognizing F-formations for Social Robots

Figure 4 for REFORM: Recognizing F-formations for Social Robots

Abstract:Recognizing and understanding conversational groups, or F-formations, is a critical task for situated agents designed to interact with humans. F-formations contain complex structures and dynamics, yet are used intuitively by people in everyday face-to-face conversations. Prior research exploring ways of identifying F-formations has largely relied on heuristic algorithms that may not capture the rich dynamic behaviors employed by humans. We introduce REFORM (REcognize F-FORmations with Machine learning), a data-driven approach for detecting F-formations given human and agent positions and orientations. REFORM decomposes the scene into all possible pairs and then reconstructs F-formations with a voting-based scheme. We evaluated our approach across three datasets: the SALSA dataset, a newly collected human-only dataset, and a new set of acted human-robot scenarios, and found that REFORM yielded improved accuracy over a state-of-the-art F-formation detection algorithm. We also introduce symmetry and tightness as quantitative measures to characterize F-formations. Supplementary video: https://youtu.be/Fp7ETdkKvdA , Dataset available at: github.com/cu-ironlab/Babble

* IROS 2020

Via

Access Paper or Ask Questions

Metareasoning in Modular Software Systems: On-the-Fly Configuration using Reinforcement Learning with Rich Contextual Representations

May 12, 2019

Aditya Modi, Debadeepta Dey, Alekh Agarwal, Adith Swaminathan, Besmira Nushi, Sean Andrist, Eric Horvitz

Figure 1 for Metareasoning in Modular Software Systems: On-the-Fly Configuration using Reinforcement Learning with Rich Contextual Representations

Figure 2 for Metareasoning in Modular Software Systems: On-the-Fly Configuration using Reinforcement Learning with Rich Contextual Representations

Figure 3 for Metareasoning in Modular Software Systems: On-the-Fly Configuration using Reinforcement Learning with Rich Contextual Representations

Figure 4 for Metareasoning in Modular Software Systems: On-the-Fly Configuration using Reinforcement Learning with Rich Contextual Representations

Abstract:Assemblies of modular subsystems are being pressed into service to perform sensing, reasoning, and decision making in high-stakes, time-critical tasks in such areas as transportation, healthcare, and industrial automation. We address the opportunity to maximize the utility of an overall computing system by employing reinforcement learning to guide the configuration of the set of interacting modules that comprise the system. The challenge of doing system-wide optimization is a combinatorial problem. Local attempts to boost the performance of a specific module by modifying its configuration often leads to losses in overall utility of the system's performance as the distribution of inputs to downstream modules changes drastically. We present metareasoning techniques which consider a rich representation of the input, monitor the state of the entire pipeline, and adjust the configuration of modules on-the-fly so as to maximize the utility of a system's operation. We show significant improvement in both real-world and synthetic pipelines across a variety of reinforcement learning techniques.

* 12 pages, 7 figures, 2 tables

Via

Access Paper or Ask Questions