Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dan Bohus

SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research

May 16, 2024

Dan Bohus, Sean Andrist, Nick Saw, Ann Paradiso, Ishani Chakraborty, Mahdi Rad

Figure 1 for SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research

Figure 2 for SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research

Figure 3 for SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research

Figure 4 for SIGMA: An Open-Source Interactive System for Mixed-Reality Task Assistance Research

Abstract:We introduce an open-source system called SIGMA (short for "Situated Interactive Guidance, Monitoring, and Assistance") as a platform for conducting research on task-assistive agents in mixed-reality scenarios. The system leverages the sensing and rendering affordances of a head-mounted mixed-reality device in conjunction with large language and vision models to guide users step by step through procedural tasks. We present the system's core capabilities, discuss its overall design and implementation, and outline directions for future research enabled by the system. SIGMA is easily extensible and provides a useful basis for future research at the intersection of mixed reality and AI. By open-sourcing an end-to-end implementation, we aim to lower the barrier to entry, accelerate research in this space, and chart a path towards community-driven end-to-end evaluation of large language, vision, and multimodal models in the context of real-world interactive applications.

* 10 pages, 5 figures

Via

Access Paper or Ask Questions

HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World

Sep 29, 2023

Xin Wang, Taein Kwon, Mahdi Rad, Bowen Pan, Ishani Chakraborty, Sean Andrist, Dan Bohus, Ashley Feniello, Bugra Tekin, Felipe Vieira Frujeri(+2 more)

Figure 1 for HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World

Figure 2 for HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World

Figure 3 for HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World

Figure 4 for HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World

Abstract:Building an interactive AI assistant that can perceive, reason, and collaborate with humans in the real world has been a long-standing pursuit in the AI community. This work is part of a broader research effort to develop intelligent agents that can interactively guide humans through performing tasks in the physical world. As a first step in this direction, we introduce HoloAssist, a large-scale egocentric human interaction dataset, where two people collaboratively complete physical manipulation tasks. The task performer executes the task while wearing a mixed-reality headset that captures seven synchronized data streams. The task instructor watches the performer's egocentric video in real time and guides them verbally. By augmenting the data with action and conversational annotations and observing the rich behaviors of various participants, we present key insights into how human assistants correct mistakes, intervene in the task completion procedure, and ground their instructions to the environment. HoloAssist spans 166 hours of data captured by 350 unique instructor-performer pairs. Furthermore, we construct and present benchmarks on mistake detection, intervention type prediction, and hand forecasting, along with detailed analysis. We expect HoloAssist will provide an important resource for building AI assistants that can fluidly collaborate with humans in the real world. Data can be downloaded at https://holoassist.github.io/.

* ICCV 2023

Via

Access Paper or Ask Questions

Platform for Situated Intelligence

Mar 29, 2021

Dan Bohus, Sean Andrist, Ashley Feniello, Nick Saw, Mihai Jalobeanu, Patrick Sweeney, Anne Loomis Thompson, Eric Horvitz

Figure 1 for Platform for Situated Intelligence

Figure 2 for Platform for Situated Intelligence

Figure 3 for Platform for Situated Intelligence

Figure 4 for Platform for Situated Intelligence

Abstract:We introduce Platform for Situated Intelligence, an open-source framework created to support the rapid development and study of multimodal, integrative-AI systems. The framework provides infrastructure for sensing, fusing, and making inferences from temporal streams of data across different modalities, a set of tools that enable visualization and debugging, and an ecosystem of components that encapsulate a variety of perception and processing technologies. These assets jointly provide the means for rapidly constructing and refining multimodal, integrative-AI systems, while retaining the efficiency and performance characteristics required for deployment in open-world settings.

* 29 pages, 14 figures, Microsoft Research Technical Report

Via

Access Paper or Ask Questions

Accelerating the Development of Multimodal, Integrative-AI Systems with Platform for Situated Intelligence

Oct 12, 2020

Sean Andrist, Dan Bohus

Figure 1 for Accelerating the Development of Multimodal, Integrative-AI Systems with Platform for Situated Intelligence

Abstract:We describe Platform for Situated Intelligence, an open-source framework for multimodal, integrative-AI systems. The framework provides infrastructure, tools, and components that enable and accelerate the development of applications that process multimodal streams of data and in which timing is critical. The framework is particularly well-suited for developing physically situated interactive systems that perceive and reason about their surroundings in order to better interact with people, such as social robots, virtual assistants, smart meeting rooms, etc. In this paper, we provide a brief, high-level overview of the framework and its main affordances, and discuss its implications for HRI.

* 5 pages, 1 figure. Submitted to the 2020 AAAI Fall Symposium: Trust and Explainability in Artificial Intelligence for Human-Robot Interaction

Via

Access Paper or Ask Questions