Abstract:To enable effective human-AI collaboration, merely optimizing AI performance while ignoring humans is not sufficient. Recent research has demonstrated that designing AI agents to account for human behavior leads to improved performance in human-AI collaboration. However, a limitation of most existing approaches is their assumption that human behavior is static, irrespective of AI behavior. In reality, humans may adjust their action plans based on their observations of AI behavior. In this paper, we address this limitation by enabling a collaborative AI agent to consider the beliefs of its human partner, i.e., what the human partner thinks the AI agent is doing, and design its action plan to facilitate easier collaboration with its human partner. Specifically, we developed a model of human beliefs that accounts for how humans reason about the behavior of their AI partners. Based on this belief model, we then developed an AI agent that considers both human behavior and human beliefs in devising its strategy for working with humans. Through extensive real-world human-subject experiments, we demonstrated that our belief model more accurately predicts humans' beliefs about AI behavior. Moreover, we showed that our design of AI agents that accounts for human beliefs enhances performance in human-AI collaboration.
Abstract:Goal recognition design aims to make limited modifications to decision-making environments with the goal of making it easier to infer the goals of agents acting within those environments. Although various research efforts have been made in goal recognition design, existing approaches are computationally demanding and often assume that agents are (near-)optimal in their decision-making. To address these limitations, we introduce a data-driven approach to goal recognition design that can account for agents with general behavioral models. Following existing literature, we use worst-case distinctiveness ($\textit{wcd}$) as a measure of the difficulty in inferring the goal of an agent in a decision-making environment. Our approach begins by training a machine learning model to predict the $\textit{wcd}$ for a given environment and the agent behavior model. We then propose a gradient-based optimization framework that accommodates various constraints to optimize decision-making environments for enhanced goal recognition. Through extensive simulations, we demonstrate that our approach outperforms existing methods in reducing $\textit{wcd}$ and enhancing runtime efficiency in conventional setups, and it also adapts to scenarios not previously covered in the literature, such as those involving flexible budget constraints, more complex environments, and suboptimal agent behavior. Moreover, we have conducted human-subject experiments which confirm that our method can create environments that facilitate efficient goal recognition from real-world human decision-makers.
Abstract:Cognitive modeling commonly relies on asking participants to complete a battery of varied tests in order to estimate attention, working memory, and other latent variables. In many cases, these tests result in highly variable observation models. A near-ubiquitous approach is to repeat many observations for each test, resulting in a distribution over the outcomes from each test given to each subject. In this paper, we explore the usage of latent variable modeling to enable learning across many correlated variables simultaneously. We extend latent variable models (LVMs) to the setting where observed data for each subject are a series of observations from many different distributions, rather than simple vectors to be reconstructed. By embedding test battery results for individuals in a latent space that is trained jointly across a population, we are able to leverage correlations both between tests for a single participant and between multiple participants. We then propose an active learning framework that leverages this model to conduct more efficient cognitive test batteries. We validate our approach by demonstrating with real-time data acquisition that it performs comparably to conventional methods in making item-level predictions with fewer test items.