Abstract:In the field of robotics many different approaches ranging from classical planning over optimal control to reinforcement learning (RL) are developed and borrowed from other fields to achieve reliable control in diverse tasks. In order to get a clear understanding of their individual strengths and weaknesses and their applicability in real world robotic scenarios is it important to benchmark and compare their performances not only in a simulation but also on real hardware. The '2nd AI Olympics with RealAIGym' competition was held at the IROS 2024 conference to contribute to this cause and evaluate different controllers according to their ability to solve a dynamic control problem on an underactuated double pendulum system with chaotic dynamics. This paper describes the four different RL methods submitted by the participating teams, presents their performance in the swing-up task on a real double pendulum, measured against various criteria, and discusses their transferability from simulation to real hardware and their robustness to external disturbances.
Abstract:Recent advancements in Large Language Models (LLMs) and Visual Language Models (VLMs) have significantly impacted robotics, enabling high-level semantic motion planning applications. Reinforcement Learning (RL), a complementary paradigm, enables agents to autonomously optimize complex behaviors through interaction and reward signals. However, designing effective reward functions for RL remains challenging, especially in real-world tasks where sparse rewards are insufficient and dense rewards require elaborate design. In this work, we propose Autonomous Reinforcement learning for Complex HumanInformed Environments (ARCHIE), an unsupervised pipeline leveraging GPT-4, a pre-trained LLM, to generate reward functions directly from natural language task descriptions. The rewards are used to train RL agents in simulated environments, where we formalize the reward generation process to enhance feasibility. Additionally, GPT-4 automates the coding of task success criteria, creating a fully automated, one-shot procedure for translating human-readable text into deployable robot skills. Our approach is validated through extensive simulated experiments on single-arm and bi-manual manipulation tasks using an ABB YuMi collaborative robot, highlighting its practicality and effectiveness. Tasks are demonstrated on the real robot setup.
Abstract:Pick-and-place (PnP) operations, featuring object grasping and trajectory planning, are fundamental in industrial robotics applications. Despite many advancements in the field, PnP is limited by workspace constraints, reducing flexibility. Pick-and-throw (PnT) is a promising alternative where the robot throws objects to target locations, leveraging extrinsic resources like gravity to improve efficiency and expand the workspace. However, PnT execution is complex, requiring precise coordination of high-speed movements and object dynamics. Solutions to the PnT problem are categorized into analytical and learning-based approaches. Analytical methods focus on system modeling and trajectory generation but are time-consuming and offer limited generalization. Learning-based solutions, in particular Model-Free Reinforcement Learning (MFRL), offer automation and adaptability but require extensive interaction time. This paper introduces a Model-Based Reinforcement Learning (MBRL) framework, MC-PILOT, which combines data-driven modeling with policy optimization for efficient and accurate PnT tasks. MC-PILOT accounts for model uncertainties and release errors, demonstrating superior performance in simulations and real-world tests with a Franka Emika Panda manipulator. The proposed approach generalizes rapidly to new targets, offering advantages over analytical and Model-Free methods.
Abstract:This report describes our proposed solution for the second AI Olympics competition held at IROS 2024. Our solution is based on a recent Model-Based Reinforcement Learning algorithm named MC-PILCO. Besides briefly reviewing the algorithm, we discuss the most critical aspects of the MC-PILCO implementation in the tasks at hand.
Abstract:The ability to achieve precise and smooth trajectory tracking is crucial for ensuring the successful execution of various tasks involving robotic manipulators. State-of-the-art techniques require accurate mathematical models of the robot dynamics, and robustness to model uncertainties is achieved by relying on precise bounds on the model mismatch. In this paper, we propose a novel adaptive robust feedback linearization scheme able to compensate for model uncertainties without any a-priori knowledge on them, and we provide a theoretical proof of convergence under mild assumptions. We evaluate the method on a simulated RR robot. First, we consider a nominal model with known model mismatch, which allows us to compare our strategy with state-of-the-art uncertainty-aware methods. Second, we implement the proposed control law in combination with a learned model, for which uncertainty bounds are not available. Results show that our method leads to performance comparable to uncertainty-aware methods while requiring less prior knowledge.
Abstract:This paper introduces innovative methods in Reinforcement Learning (RL), focusing on addressing and exploiting estimation biases in Actor-Critic methods for continuous control tasks, using Deep Double Q-Learning. We propose two novel algorithms: Expectile Delayed Deep Deterministic Policy Gradient (ExpD3) and Bias Exploiting - Twin Delayed Deep Deterministic Policy Gradient (BE-TD3). ExpD3 aims to reduce overestimation bias with a single $Q$ estimate, offering a balance between computational efficiency and performance, while BE-TD3 is designed to dynamically select the most advantageous estimation bias during training. Our extensive experiments across various continuous control tasks demonstrate the effectiveness of our approaches. We show that these algorithms can either match or surpass existing methods like TD3, particularly in environments where estimation biases significantly impact learning. The results underline the importance of bias exploitation in improving policy learning in RL.
Abstract:In this paper, we propose a black-box model based on Gaussian process regression for the identification of the inverse dynamics of robotic manipulators. The proposed model relies on a novel multidimensional kernel, called \textit{Lagrangian Inspired Polynomial} (\kernelInitials{}) kernel. The \kernelInitials{} kernel is based on two main ideas. First, instead of directly modeling the inverse dynamics components, we model as GPs the kinetic and potential energy of the system. The GP prior on the inverse dynamics components is derived from those on the energies by applying the properties of GPs under linear operators. Second, as regards the energy prior definition, we prove a polynomial structure of the kinetic and potential energy, and we derive a polynomial kernel that encodes this property. As a consequence, the proposed model allows also to estimate the kinetic and potential energy without requiring any label on these quantities. Results on simulation and on two real robotic manipulators, namely a 7 DOF Franka Emika Panda and a 6 DOF MELFA RV4FL, show that the proposed model outperforms state-of-the-art black-box estimators based both on Gaussian Processes and Neural Networks in terms of accuracy, generality and data efficiency. The experiments on the MELFA robot also demonstrate that our approach achieves performance comparable to fine-tuned model-based estimators, despite requiring less prior information.
Abstract:In this paper, we propose to estimate the forward dynamics equations of mechanical systems by learning a model of the inverse dynamics and estimating individual dynamics components from it. We revisit the classical formulation of rigid body dynamics in order to extrapolate the physical dynamical components, such as inertial and gravitational components, from an inverse dynamics model. After estimating the dynamical components, the forward dynamics can be computed in closed form as a function of the learned inverse dynamics. We tested the proposed method with several machine learning models based on Gaussian Process Regression and compared them with the standard approach of learning the forward dynamics directly. Results on two simulated robotic manipulators, a PANDA Franka Emika and a UR10, show the effectiveness of the proposed method in learning the forward dynamics, both in terms of accuracy as well as in opening the possibility of using more structured~models.
Abstract:We propose a Model-Based Reinforcement Learning (MBRL) algorithm named VF-MC-PILCO, specifically designed for application to mechanical systems where velocities cannot be directly measured. This circumstance, if not adequately considered, can compromise the success of MBRL approaches. To cope with this problem, we define a velocity-free state formulation which consists of the collection of past positions and inputs. Then, VF-MC-PILCO uses Gaussian Process Regression to model the dynamics of the velocity-free state and optimizes the control policy through a particle-based policy gradient approach. We compare VF-MC-PILCO with our previous MBRL algorithm, MC-PILCO4PMS, which handles the lack of direct velocity measurements by modeling the presence of velocity estimators. Results on both simulated (cart-pole and UR5 robot) and real mechanical systems (Furuta pendulum and a ball-and-plate rig) show that the two algorithms achieve similar results. Conveniently, VF-MC-PILCO does not require the design and implementation of state estimators, which can be a challenging and time-consuming activity to be performed by an expert user.
Abstract:In this paper, we consider the use of black-box Gaussian process (GP) models for trajectory tracking control based on feedback linearization, in the context of mechanical systems. We considered two strategies. The first computes the control input directly by using the GP model, whereas the second computes the input after estimating the individual components of the dynamics. We tested the two strategies on a simulated manipulator with seven degrees of freedom, also varying the GP kernel choice. Results show that the second implementation is more robust w.r.t. the kernel choice and model inaccuracies. Moreover, as regards the choice of kernel, the obtained performance shows that the use of a structured kernel, such as a polynomial kernel, is advantageous, because of its effectiveness with both strategies.