Abstract:Agent learning from human interaction often relies on explicit signals, but implicit social cues, such as prosody in speech, could provide valuable information for more effective learning. This paper advocates for the integration of prosody as a teaching signal to enhance agent learning from human teachers. Through two exploratory studies--one examining voice feedback in an interactive reinforcement learning setup and the other analyzing restricted audio from human demonstrations in three Atari games--we demonstrate that prosody carries significant information about task dynamics. Our findings suggest that prosodic features, when coupled with explicit feedback, can enhance reinforcement learning outcomes. Moreover, we propose guidelines for prosody-sensitive algorithm design and discuss insights into teaching behavior. Our work underscores the potential of leveraging prosody as an implicit signal for more efficient agent learning, thus advancing human-agent interaction paradigms.
Abstract:Reinforcement Learning (RL) has achieved great success in sequential decision-making problems, but often at the cost of a large number of agent-environment interactions. To improve sample efficiency, methods like Reinforcement Learning from Expert Demonstrations (RLED) introduce external expert demonstrations to facilitate agent exploration during the learning process. In practice, these demonstrations, which are often collected from human users, are costly and hence often constrained to a limited amount. How to select the best set of human demonstrations that is most beneficial for learning therefore becomes a major concern. This paper presents EARLY (Episodic Active Learning from demonstration querY), an algorithm that enables a learning agent to generate optimized queries of expert demonstrations in a trajectory-based feature space. Based on a trajectory-level estimate of uncertainty in the agent's current policy, EARLY determines the optimized timing and content for feature-based queries. By querying episodic demonstrations as opposed to isolated state-action pairs, EARLY improves the human teaching experience and achieves better learning performance. We validate the effectiveness of our method in three simulated navigation tasks of increasing difficulty. The results show that our method is able to achieve expert-level performance for all three tasks with convergence over 30\% faster than other baseline methods when demonstrations are generated by simulated oracle policies. The results of a follow-up pilot user study (N=18) further validate that our method can still maintain a significantly better convergence in the case of human expert demonstrators while achieving a better user experience in perceived task load and consuming significantly less human time.
Abstract:Large language models, in particular generative pre-trained transformers (GPTs), show impressive results on a wide variety of language-related tasks. In this paper, we explore ChatGPT's zero-shot ability to perform affective computing tasks using prompting alone. We show that ChatGPT a) performs meaningful sentiment analysis in the Valence, Arousal and Dominance dimensions, b) has meaningful emotion representations in terms of emotion categories and these affective dimensions, and c) can perform basic appraisal-based emotion elicitation of situations based on a prompt-based computational implementation of the OCC appraisal model. These findings are highly relevant: First, they show that the ability to solve complex affect processing tasks emerges from language-based token prediction trained on extensive data sets. Second, they show the potential of large language models for simulating, processing and analyzing human emotions, which has important implications for various applications such as sentiment analysis, socially interactive agents, and social robotics.
Abstract:We present CycleDance, a dance style transfer system to transform an existing motion clip in one dance style to a motion clip in another dance style while attempting to preserve motion context of the dance. Our method extends an existing CycleGAN architecture for modeling audio sequences and integrates multimodal transformer encoders to account for music context. We adopt sequence length-based curriculum learning to stabilize training. Our approach captures rich and long-term intra-relations between motion frames, which is a common challenge in motion transfer and synthesis work. We further introduce new metrics for gauging transfer strength and content preservation in the context of dance movements. We perform an extensive ablation study as well as a human study including 30 participants with 5 or more years of dance experience. The results demonstrate that CycleDance generates realistic movements with the target style, significantly outperforming the baseline CycleGAN on naturalness, transfer strength, and content preservation.
Abstract:Social robots are becoming increasingly diverse in their design, behavior, and usage. In this chapter, we provide a broad-ranging overview of the main characteristics that arise when one considers social robots and their interactions with humans. We specifically contribute a framework for characterizing social robots along 7 dimensions that we found to be most relevant to their design. These dimensions are: appearance, social capabilities, purpose and application area, relational role, autonomy and intelligence, proximity, and temporal profile. Within each dimension, we account for the variety of social robots through a combination of classifications and/or explanations. Our framework builds on and goes beyond existing frameworks, such as classifications and taxonomies found in the literature. More specifically, it contributes to the unification, clarification, and extension of key concepts, drawing from a rich body of relevant literature. This chapter is meant to serve as a resource for researchers, designers, and developers within and outside the field of social robotics. It is intended to provide them with tools to better understand and position existing social robots, as well as to inform their future design.
Abstract:Robot-Assisted Therapy (RAT) has successfully been used in HRI research by including social robots in health-care interventions by virtue of their ability to engage human users both social and emotional dimensions. Research projects on this topic exist all over the globe in the USA, Europe, and Asia. All of these projects have the overall ambitious goal to increase the well-being of a vulnerable population. Typical work in RAT is performed using remote controlled robots; a technique called Wizard-of-Oz (WoZ). The robot is usually controlled, unbeknownst to the patient, by a human operator. However, WoZ has been demonstrated to not be a sustainable technique in the long-term. Providing the robots with autonomy (while remaining under the supervision of the therapist) has the potential to lighten the therapists burden, not only in the therapeutic session itself but also in longer-term diagnostic tasks. Therefore, there is a need for exploring several degrees of autonomy in social robots used in therapy. Increasing the autonomy of robots might also bring about a new set of challenges. In particular, there will be a need to answer new ethical questions regarding the use of robots with a vulnerable population, as well as a need to ensure ethically-compliant robot behaviours. Therefore, in this workshop we want to gather findings and explore which degree of autonomy might help to improve health-care interventions and how we can overcome the ethical challenges inherent to it.