Abstract:Large language models have enabled agents of all kinds to interact with users through natural conversation. Consequently, agents now have two jobs: conversing and planning/reasoning. Their conversational responses must be informed by all available information, and their actions must help to achieve goals. This dichotomy between conversing with the user and doing multi-step reasoning and planning can be seen as analogous to the human systems of "thinking fast and slow" as introduced by Kahneman. Our approach is comprised of a "Talker" agent (System 1) that is fast and intuitive, and tasked with synthesizing the conversational response; and a "Reasoner" agent (System 2) that is slower, more deliberative, and more logical, and is tasked with multi-step reasoning and planning, calling tools, performing actions in the world, and thereby producing the new agent state. We describe the new Talker-Reasoner architecture and discuss its advantages, including modularity and decreased latency. We ground the discussion in the context of a sleep coaching agent, in order to demonstrate real-world relevance.
Abstract:Understanding and respecting personal space preferences is essential for socially assistive robots designed for older adult users. This work introduces and evaluates a novel personalized context-aware method for modeling users' proxemics preferences during human-robot interactions. Using an interactive augmented reality interface, we collected a set of user-preferred distances from the robot and employed an active transfer learning approach to fine-tune a specialized deep learning model. We evaluated this approach through two user studies: 1) a convenience population study (N = 24) to validate the efficacy of the active transfer learning approach; and 2) a user study involving older adults (N = 15) to assess the system's usability. We compared the data collected with the augmented reality interface and with the physical robot to examine the relationship between proxemics preferences for a virtual robot versus a physically embodied robot. We found that fine-tuning significantly improved model performance: on average, the error in testing decreased by 26.97% after fine-tuning. The system was well-received by older adult participants, who provided valuable feedback and suggestions for future work.
Abstract:Perceptions of gender are a significant aspect of human-human interaction, and gender has wide-reaching social implications for robots deployed in contexts where they are expected to interact with humans. This work explored two flexible modalities for communicating gender in robots--voice and appearance--and we studied their individual and combined influences on a robot's perceived gender. We evaluated the perception of a robot's gender through three video-based studies. First, we conducted a study (n=65) on the gender perception of robot voices by varying speaker identity and pitch. Second, we conducted a study (n=93) on the gender perception of robot clothing designed for two different tasks. Finally, building on the results of the first two studies, we completed a large integrative video-based study (n=273) involving two human-robot interaction tasks. We found that voice and clothing can be used to reliably establish a robot's perceived gender, and that combining these two modalities can have different effects on the robot's perceived gender. Taken together, these results inform the design of robot voices and clothing as individual and interacting components in the perceptions of robot gender.
Abstract:Adaptive training programs are crucial for recovery post stroke. However, developing programs that automatically adapt depends on quantifying how difficult a task is for a specific individual at a particular stage of their recovery. In this work, we propose a method that automatically generates regions of different task difficulty levels based on an individual's performance. We show that this technique explains the variance in user performance for a reaching task better than previous approaches to estimating task difficulty.
Abstract:Users develop mental models of robots to conceptualize what kind of interactions they can have with those robots. The conceptualizations are often formed before interactions with the robot and are based only on observing the robot's physical design. As a result, understanding conceptualizations formed from physical design is necessary to understand how users intend to interact with the robot. We propose to use multimodal features of robot embodiments to predict what kinds of expectations users will have about a given robot's social and physical capabilities. We show that using such features provides information about general mental models of the robots that generalize across socially interactive robots. We describe how these models can be incorporated into interaction design and physical design for researchers working with socially interactive robots.
Abstract:College students with ADHD respond positively to simple socially assistive robots (SARs) that monitor attention and provide non-verbal feedback, but studies have been done only in brief in-lab sessions. We present an initial design and evaluation of an in-dorm SAR study companion for college students with ADHD. This work represents the introductory stages of an ongoing user-centered, participatory design process. In a three-week within-subjects user study, university students (N=11) with self-reported symptoms of adult ADHD had a SAR study companion in their dorm room for two weeks and a computer-based system for one week. Toward developing SARs for long-term, in-dorm use, we focus on 1) evaluating the usability and desire for SAR study companions by college students with ADHD and 2) collecting participant feedback about the SAR design and functionality. Participants responded positively to the robot; after one week of regular use, 91% (10 of 11) chose to continue using the robot voluntarily in the second week.
Abstract:Mindfulness-based therapies have been shown to be effective in improving mental health, and technology-based methods have the potential to expand the accessibility of these therapies. To enable real-time personalized content generation for mindfulness practice in these methods, high-quality computer-synthesized text-to-speech (TTS) voices are needed to provide verbal guidance and respond to user performance and preferences. However, the user-perceived quality of state-of-the-art TTS voices has not yet been evaluated for administering mindfulness meditation, which requires emotional expressiveness. In addition, work has not yet been done to study the effect of physical embodiment and personalization on the user-perceived quality of TTS voices for mindfulness. To that end, we designed a two-phase human subject study. In Phase 1, an online Mechanical Turk between-subject study (N=471) evaluated 3 (feminine, masculine, child-like) state-of-the-art TTS voices with 2 (feminine, masculine) human therapists' voices in 3 different physical embodiment settings (no agent, conversational agent, socially assistive robot) with remote participants. Building on findings from Phase 1, in Phase 2, an in-person within-subject study (N=94), we used a novel framework we developed for personalizing TTS voices based on user preferences, and evaluated user-perceived quality compared to best-rated non-personalized voices from Phase 1. We found that the best-rated human voice was perceived better than all TTS voices; the emotional expressiveness and naturalness of TTS voices were poorly rated, while users were satisfied with the clarity of TTS voices. Surprisingly, by allowing users to fine-tune TTS voice features, the user-personalized TTS voices could perform almost as well as human voices, suggesting user personalization could be a simple and very effective tool to improve user-perceived quality of TTS voice.
Abstract:Robots that cooperate with humans must be effective at communicating with them. However, people have varied preferences for communication based on many contextual factors, such as culture, environment, and past experience. To communicate effectively, robots must take those factors into consideration. In this work, we present the Robot Signal Design (RoSiD) tool to empower people to easily self-specify communicative preferences for collaborative robots. We show through a participatory design study that the RoSiD tool enables users to create signals that align with their communicative preferences, and we illuminate how this tool can be further improved.
Abstract:This paper describes a between-subjects Amazon Mechanical Turk study (n = 220) that investigated how a robot's affective narrative influences its ability to elicit empathy in human observers. We first conducted a pilot study to develop and validate the robot's affective narratives. Then, in the full study, the robot used one of three different affective narrative strategies (funny, sad, neutral) while becoming less functional at its shopping task over the course of the interaction. As the functionality of the robot degraded, participants were repeatedly asked if they were willing to help the robot. The results showed that conveying a sad narrative significantly influenced the participants' willingness to help the robot throughout the interaction and determined whether participants felt empathetic toward the robot throughout the interaction. Furthermore, a higher amount of past experience with robots also increased the participants' willingness to help the robot. This work suggests that affective narratives can be useful in short-term interactions that benefit from emotional connections between humans and robots.
Abstract:The advent of large language models (LLMs) has revolutionized natural language processing, enabling the generation of coherent and contextually relevant text. As LLMs increasingly power conversational agents, the synthesized personality embedded in these models by virtue of their training on large amounts of human-generated data draws attention. Since personality is an important factor determining the effectiveness of communication, we present a comprehensive method for administering validated psychometric tests and quantifying, analyzing, and shaping personality traits exhibited in text generated from widely-used LLMs. We find that: 1) personality simulated in the outputs of some LLMs (under specific prompting configurations) is reliable and valid; 2) evidence of reliability and validity of LLM-simulated personality is stronger for larger and instruction fine-tuned models; and 3) personality in LLM outputs can be shaped along desired dimensions to mimic specific personality profiles. We also discuss potential applications and ethical implications of our measurement and shaping framework, especially regarding responsible use of LLMs.