Abstract:In this work, the novel, open-source humanoid robot, PANDORA, is presented where a majority of the structural elements are manufactured using 3D-printed compliant materials. As opposed to contemporary approaches that incorporate the elastic element into the actuator mechanisms, PANDORA is designed to be compliant under load, or in other words, structurally elastic. This design approach lowers manufacturing cost and time, design complexity, and assembly time while introducing controls challenges in state estimation, joint and whole-body control. This work features an in-depth description on the mechanical and electrical subsystems including details regarding additive manufacturing benefits and drawbacks, usage and placement of sensors, and networking between devices. In addition, the design of structural elastic components and their effects on overall performance from an estimation and control perspective are discussed. Finally, results are presented which demonstrate the robot completing a robust balancing objective in the presence of disturbances and stepping behaviors.
Abstract:Many state-of-the art robotic applications utilize series elastic actuators (SEAs) with closed-loop force control to achieve complex tasks such as walking, lifting, and manipulation. Model-free PID control methods are more prone to instability due to nonlinearities in the SEA where cascaded model-based robust controllers can remove these effects to achieve stable force control. However, these model-based methods require detailed investigations to characterize the system accurately. Deep reinforcement learning (DRL) has proved to be an effective model-free method for continuous control tasks, where few works deal with hardware learning. This paper describes the training process of a DRL policy on hardware of an SEA pendulum system for tracking force control trajectories from 0.05 - 0.35 Hz at 50 N amplitude using the Proximal Policy Optimization (PPO) algorithm. Safety mechanisms are developed and utilized for training the policy for 12 hours (overnight) without an operator present within the full 21 hours training period. The tracking performance is evaluated showing improvements of $25$ N in mean absolute error when comparing the first 18 min. of training to the full 21 hours for a 50 N amplitude, 0.1 Hz sinusoid desired force trajectory. Finally, the DRL policy exhibits better tracking and stability margins when compared to a model-free PID controller for a 50 N chirp force trajectory.