Abstract:We present a data-driven control architecture for modifying the kinematics of robots and artificial avatars to encode specific information such as the presence or not of an emotion in the movements of an avatar or robot driven by a human operator. We validate our approach on an experimental dataset obtained during the reach-to-grasp phase of a pick-and-place task.
Abstract:In addressing control problems such as regulation and tracking through reinforcement learning, it is often required to guarantee that the acquired policy meets essential performance and stability criteria such as a desired settling time and steady-state error prior to deployment. Motivated by this necessity, we present a set of results and a systematic reward shaping procedure that (i) ensures the optimal policy generates trajectories that align with specified control requirements and (ii) allows to assess whether any given policy satisfies them. We validate our approach through comprehensive numerical experiments conducted in two representative environments from OpenAI Gym: the Inverted Pendulum swing-up problem and the Lunar Lander. Utilizing both tabular and deep reinforcement learning methods, our experiments consistently affirm the efficacy of our proposed framework, highlighting its effectiveness in ensuring policy adherence to the prescribed control requirements.
Abstract:One of the major challenges in Deep Reinforcement Learning for control is the need for extensive training to learn the policy. Motivated by this, we present the design of the Control-Tutored Deep Q-Networks (CT-DQN) algorithm, a Deep Reinforcement Learning algorithm that leverages a control tutor, i.e., an exogenous control law, to reduce learning time. The tutor can be designed using an approximate model of the system, without any assumption about the knowledge of the system's dynamics. There is no expectation that it will be able to achieve the control objective if used stand-alone. During learning, the tutor occasionally suggests an action, thus partially guiding exploration. We validate our approach on three scenarios from OpenAI Gym: the inverted pendulum, lunar lander, and car racing. We demonstrate that CT-DQN is able to achieve better or equivalent data efficiency with respect to the classic function approximation solutions.
Abstract:The problem of partitioning a power grid into a set of microgrids, or islands, is of interest for both the design of future smart grids, and as a last resort to restore power dispatchment in sections of a grid affected by an extreme failure. In the literature this problem is usually solved by turning it into a combinatorial optimization problem, often solved through generic heruristic methods such as Genetic Algorithms or Tabu Search. In this paper, we take a different route and obtain the grid partition by exploiting the synchronization dynamics of a cyberlayer of Kuramoto oscillators, each parameterized as a rough approximation of the dynamics of the grid's node it corresponds to. We present first a centralised algorithm and then a decentralised strategy. In the former, nodes are aggregated based on their internode synchronization times while in the latter they exploit synchronization of the oscillators in the cyber layer to selforganise into islands. Our preliminary results show that the heuristic synchronization based algorithms do converge towards partitions that are comparable to those obtained via other more cumbersome and computationally expensive optimization-based methods.
Abstract:The Painlev\'e paradox is a phenomenon that causes instability in mechanical systems subjects to unilateral constraints. While earlier studies were mostly focused on abstract theoretical settings, recent work confirmed the occurrence of the paradox in realistic set-ups. In this paper, we investigate the dynamics and presence of the Painlev\'e phenomenon in a twolinks robot in contact with a moving belt, through a bifurcation study. Then, we use the results of this analysis to inform the design of control strategies able to keep the robot sliding on the belt and avoid the onset of undesired lift-off. To this aim, through numerical simulations, we synthesise and compare a PID strategy and a hybrid force/motion control scheme, finding that the latter is able to guarantee better performance and avoid the onset of bouncing motion due to the Painlev\'e phenomenon.