Division of Robotics, Perception and Learning, KTH - Royal Institute of Technology, Stockholm, Sweden
Abstract:Accurate recognition of human emotions is a crucial challenge in affective computing and human-robot interaction (HRI). Emotional states play a vital role in shaping behaviors, decisions, and social interactions. However, emotional expressions can be influenced by contextual factors, leading to misinterpretations if context is not considered. Multimodal fusion, combining modalities like facial expressions, speech, and physiological signals, has shown promise in improving affect recognition. This paper proposes a transformer-based multimodal fusion approach that leverages facial thermal data, facial action units, and textual context information for context-aware emotion recognition. We explore modality-specific encoders to learn tailored representations, which are then fused using additive fusion and processed by a shared transformer encoder to capture temporal dependencies and interactions. The proposed method is evaluated on a dataset collected from participants engaged in a tangible tabletop Pacman game designed to induce various affective states. Our results demonstrate the effectiveness of incorporating contextual information and multimodal fusion for affective state recognition.
Abstract:Behavior Trees (BTs) were first conceived in the computer games industry as a tool to model agent behavior, but they received interest also in the robotics community as an alternative policy design to Finite State Machines (FSMs). The advantages of BTs over FSMs had been highlighted in many works, but there is no thorough practical comparison of the two designs. Such a comparison is particularly relevant in the robotic industry, where FSMs have been the state-of-the-art policy representation for robot control for many years. In this work we shed light on this matter by comparing how BTs and FSMs behave when controlling a robot in a mobile manipulation task. The comparison is made in terms of reactivity, modularity, readability, and design. We propose metrics for each of these properties, being aware that while some are tangible and objective, others are more subjective and implementation dependent. The practical comparison is performed in a simulation environment with validation on a real robot. We find that although the robot's behavior during task solving is independent on the policy representation, maintaining a BT rather than an FSM becomes easier as the task increases in complexity.
Abstract:In industrial applications Finite State Machines (FSMs) are often used to implement decision making policies for autonomous systems. In recent years, the use of Behavior Trees (BT) as an alternative policy representation has gained considerable attention. The benefits of using BTs over FSMs are modularity and reusability, enabling a system that is easy to extend and modify. However, there exists few published studies on successful implementations of BTs for industrial applications. This paper contributes with the lessons learned from implementing BTs in a complex industrial use case, where a robotic system assembles explosive charges and places them in holes on the rock face. The main result of the paper is that even if it is possible to model the entire system as a BT, combining BTs with FSMs can increase the readability and maintainability of the system. The benefit of such combination is remarked especially in the use case studied in this paper, where the full system cannot run autonomously but human supervision and feedback are needed.
Abstract:Robotic systems for manipulation tasks are increasingly expected to be easy to configure for new tasks. While in the past, robot programs were often written statically and tuned manually, the current, faster transition times call for robust, modular and interpretable solutions that also allow a robotic system to learn how to perform a task. We propose the method Behavior-based Bayesian Optimization and Planning (BeBOP) that combines two approaches for generating behavior trees: we build the structure using a reactive planner and learn specific parameters with Bayesian optimization. The method is evaluated on a set of robotic manipulation benchmarks and is shown to outperform state-of-the-art reinforcement learning algorithms by being up to 46 times faster while simultaneously being less dependent on reward shaping. We also propose a modification to the uncertainty estimate for the random forest surrogate models that drastically improves the results.
Abstract:Despite significant improvements in robot capabilities, they are likely to fail in human-robot collaborative tasks due to high unpredictability in human environments and varying human expectations. In this work, we explore the role of explanation of failures by a robot in a human-robot collaborative task. We present a user study incorporating common failures in collaborative tasks with human assistance to resolve the failure. In the study, a robot and a human work together to fill a shelf with objects. Upon encountering a failure, the robot explains the failure and the resolution to overcome the failure, either through handovers or humans completing the task. The study is conducted using different levels of robotic explanation based on the failure action, failure cause, and action history, and different strategies in providing the explanation over the course of repeated interaction. Our results show that the success in resolving the failures is not only a function of the level of explanation but also the type of failures. Furthermore, while novice users rate the robot higher overall in terms of their satisfaction with the explanation, their satisfaction is not only a function of the robot's explanation level at a certain round but also the prior information they received from the robot.
Abstract:Handovers are basic yet sophisticated motor tasks performed seamlessly by humans. They are among the most common activities in our daily lives and social environments. This makes mastering the art of handovers critical for a social and collaborative robot. In this work, we present an experimental study that involved human-human handovers by 13 pairs, i.e., 26 participants. We record and explore multiple features of handovers amongst humans aimed at inspiring handovers amongst humans and robots. With this work, we further create and publish a novel data set of 8672 handovers, bringing together human motion and the forces involved. We further analyze the effect of object weight and the role of visual sensory input in human-human handovers, as well as possible design implications for robots. As a proof of concept, the data set was used for creating a human-inspired data-driven strategy for robotic grip release in handovers, which was demonstrated to result in better robot to human handovers.
Abstract:Despite great advances in what robots can do, they still experience failures in human-robot collaborative tasks due to high randomness in unstructured human environments. Moreover, a human's unfamiliarity with a robot and its abilities can cause such failures to repeat. This makes the ability to failure explanation very important for a robot. In this work, we describe a user study that incorporated different robotic failures in a human-robot collaboration (HRC) task aimed at filling a shelf. We included different types of failures and repeated occurrences of such failures in a prolonged interaction between humans and robots. The failure resolution involved human intervention in form of human-robot bidirectional handovers. Through such studies, we aim to test different explanation types and explanation progression in the interaction and record humans.
Abstract:Handovers frequently occur in our social environments, making it imperative for a collaborative robotic system to master the skill of handover. In this work, we aim to investigate the relationship between the grip force variation for a human giver and the sensed interaction force-torque in human-human handovers, utilizing a data-driven approach. A Long-Short Term Memory (LSTM) network was trained to use the interaction force-torque in a handover to predict the human grip force variation in advance. Further, we propose to utilize the trained network to cause human-like grip force variation for a robotic giver.
Abstract:In modern industrial collaborative robotic applications, it is desirable to create robot programs automatically, intuitively, and time-efficiently. Moreover, robots need to be controlled by reactive policies to face the unpredictability of the environment they operate in. In this paper we propose a framework that combines a method that learns Behavior Trees (BTs) from demonstration with a method that evolves them with Genetic Programming (GP) for collaborative robotic applications. The main contribution of this paper is to show that by combining the two learning methods we obtain a method that allows non-expert users to semi-automatically, time-efficiently, and interactively generate BTs. We validate the framework with a series of manipulation experiments. The BT is fully learnt in simulation and then transferred to a real collaborative robot.
Abstract:Behavior Trees are a task switching policy representation that can grant reactiveness and fault tolerance. Moreover, because of their structure and modularity, a variety of methods can be used to generate them automatically. In this short paper we introduce Behavior Trees in the context of robotic applications, with overview of autonomous synthesis methods.