Abstract:Large Language Models (LLMs) are transforming the robotics domain by enabling robots to comprehend and execute natural language instructions. The cornerstone benefits of LLM include processing textual data from technical manuals, instructions, academic papers, and user queries based on the knowledge provided. However, deploying LLM-generated code in robotic systems without safety verification poses significant risks. This paper outlines a safety layer that verifies the code generated by ChatGPT before executing it to control a drone in a simulated environment. The safety layer consists of a fine-tuned GPT-4o model using Few-Shot learning, supported by knowledge graph prompting (KGP). Our approach improves the safety and compliance of robotic actions, ensuring that they adhere to the regulations of drone operations.
Abstract:Pose estimation is a critical task in computer vision with a wide range of applications from activity monitoring to human-robot interaction. However,most of the existing methods are computationally expensive or have complex architecture. Here we propose a lightweight attention based pose estimation network that utilizes depthwise separable convolution and Convolutional Block Attention Module on an hourglass backbone. The network significantly reduces the computational complexity (floating point operations) and the model size (number of parameters) containing only about 10% of parameters of original eight stack Hourglass network.Experiments were conducted on COCO and MPII datasets using a two stack hourglass backbone. The results showed that our model performs well in comparison to six other lightweight pose estimation models with an average precision of 72.07. The model achieves this performance with only 2.3M parameters and 3.7G FLOPs.
Abstract:Emerging research in Pluralistic Artificial Intelligence (AI) alignment seeks to address how intelligent systems can be designed and deployed in accordance with diverse human needs and values. We contribute to this pursuit with a dynamic approach for aligning AI with diverse and shifting user preferences through Multi Objective Reinforcement Learning (MORL), via post-learning policy selection adjustment. In this paper, we introduce the proposed framework for this approach, outline its anticipated advantages and assumptions, and discuss technical details about the implementation. We also examine the broader implications of adopting a retroactive alignment approach through the sociotechnical systems perspective.
Abstract:In reinforcement learning, reward shaping is an efficient way to guide the learning process of an agent, as the reward can indicate the optimal policy of the task. The potential-based reward shaping framework was proposed to guarantee policy invariance after reward shaping, where a potential function is used to calculate the shaping reward. In former work, we proposed a novel adaptive potential function (APF) method to learn the potential function concurrently with training the agent based on information collected by the agent during the training process, and examined the APF method in discrete action space scenarios. This paper investigates the feasibility of using APF in solving continuous-reaching tasks in a real-world robotic scenario with continuous action space. We combine the Deep Deterministic Policy Gradient (DDPG) algorithm and our proposed method to form a new algorithm called APF-DDPG. To compare APF-DDPG with DDPG, we designed a task where the agent learns to control Baxter's right arm to reach a goal position. The experimental results show that the APF-DDPG algorithm outperforms the DDPG algorithm on both learning speed and robustness.
Abstract:Emotion recognition plays a crucial role in various domains of human-robot interaction. In long-term interactions with humans, robots need to respond continuously and accurately, however, the mainstream emotion recognition methods mostly focus on short-term emotion recognition, disregarding the context in which emotions are perceived. Humans consider that contextual information and different contexts can lead to completely different emotional expressions. In this paper, we introduce self context-aware model (SCAM) that employs a two-dimensional emotion coordinate system for anchoring and re-labeling distinct emotions. Simultaneously, it incorporates its distinctive information retention structure and contextual loss. This approach has yielded significant improvements across audio, video, and multimodal. In the auditory modality, there has been a notable enhancement in accuracy, rising from 63.10% to 72.46%. Similarly, the visual modality has demonstrated improved accuracy, increasing from 77.03% to 80.82%. In the multimodal, accuracy has experienced an elevation from 77.48% to 78.93%. In the future, we will validate the reliability and usability of SCAM on robots through psychology experiments.
Abstract:We present a research outline that aims at investigating group dynamics and peer pressure in the context of industrial robots. Our research plan was motivated by the fact that industrial robots became already an integral part of human-robot co-working. However, industrial robots have been sparsely integrated into research on robot credibility, group dynamics, and potential users' tendency to follow a robot's indication. Therefore, we aim to transfer the classic Asch experiment (see \cite{Asch_51}) into HRI with industrial robots. More precisely, we will test to what extent participants follow a robot's response when confronted with a group (vs. individual) industrial robot arms (vs. human) peers who give a false response. We are interested in highlighting the effects of group size, perceived robot credibility, psychological stress, and peer pressure in the context of industrial robots. With the results of this research, we hope to highlight group dynamics that might underlie HRI in industrial settings in which numerous robots already work closely together with humans in shared environments.
Abstract:Reinforcement learning is a machine learning approach based on behavioral psychology. It is focused on learning agents that can acquire knowledge and learn to carry out new tasks by interacting with the environment. However, a problem occurs when reinforcement learning is used in critical contexts where the users of the system need to have more information and reliability for the actions executed by an agent. In this regard, explainable reinforcement learning seeks to provide to an agent in training with methods in order to explain its behavior in such a way that users with no experience in machine learning could understand the agent's behavior. One of these is the memory-based explainable reinforcement learning method that is used to compute probabilities of success for each state-action pair using an episodic memory. In this work, we propose to make use of the memory-based explainable reinforcement learning method in a hierarchical environment composed of sub-tasks that need to be first addressed to solve a more complex task. The end goal is to verify if it is possible to provide to the agent the ability to explain its actions in the global task as well as in the sub-tasks. The results obtained showed that it is possible to use the memory-based method in hierarchical environments with high-level tasks and compute the probabilities of success to be used as a basis for explaining the agent's behavior.
Abstract:In recent years, unmanned aerial vehicle (UAV) related technology has expanded knowledge in the area, bringing to light new problems and challenges that require solutions. Furthermore, because the technology allows processes usually carried out by people to be automated, it is in great demand in industrial sectors. The automation of these vehicles has been addressed in the literature, applying different machine learning strategies. Reinforcement learning (RL) is an automation framework that is frequently used to train autonomous agents. RL is a machine learning paradigm wherein an agent interacts with an environment to solve a given task. However, learning autonomously can be time consuming, computationally expensive, and may not be practical in highly-complex scenarios. Interactive reinforcement learning allows an external trainer to provide advice to an agent while it is learning a task. In this study, we set out to teach an RL agent to control a drone using reward-shaping and policy-shaping techniques simultaneously. Two simulated scenarios were proposed for the training; one without obstacles and one with obstacles. We also studied the influence of each technique. The results show that an agent trained simultaneously with both techniques obtains a lower reward than an agent trained using only a policy-based approach. Nevertheless, the agent achieves lower execution times and less dispersion during training.
Abstract:With the increasing presence of robotic systems and human-robot environments in today's society, understanding the reasoning behind actions taken by a robot is becoming more important. To increase this understanding, users are provided with explanations as to why a specific action was taken. Among other effects, these explanations improve the trust of users in their robotic partners. One option for creating these explanations is an introspection-based approach which can be used in conjunction with reinforcement learning agents to provide probabilities of success. These can in turn be used to reason about the actions taken by the agent in a human-understandable fashion. In this work, this introspection-based approach is developed and evaluated further on the basis of an episodic and a non-episodic robotics simulation task. Furthermore, an additional normalization step to the Q-values is proposed, which enables the usage of the introspection-based approach on negative and comparatively small Q-values. Results obtained show the viability of introspection for episodic robotics tasks and, additionally, that the introspection-based approach can be used to generate explanations for the actions taken in a non-episodic robotics environment as well.
Abstract:The use of interactive advice in reinforcement learning scenarios allows for speeding up the learning process for autonomous agents. Current interactive reinforcement learning research has been limited to real-time interactions that offer relevant user advice to the current state only. Moreover, the information provided by each interaction is not retained and instead discarded by the agent after a single use. In this paper, we present a method for retaining and reusing provided knowledge, allowing trainers to give general advice relevant to more than just the current state. Results obtained show that the use of broad-persistent advice substantially improves the performance of the agent while reducing the number of interactions required for the trainer.