Abstract:Soft robots, while highly adaptable to diverse environments through various actuation methods, still face significant performance boundary due to the inherent properties of materials. These limitations manifest in the challenge of guaranteeing rapid response and large-scale movements simultaneously, ultimately restricting the robots' absolute speed and overall efficiency. In this paper, we introduce a high-frequency pneumatic oscillator (HIPO) to overcome these challenges. Through a collision-induced phase resetting mechanism, our HIPO leverages event-based nonlinearity to trigger self-oscillation of pneumatic actuator, which positively utilizes intrinsic characteristics of materials. This enables the system to spontaneously generate periodic control signals and directly produce motion responses, eliminating the need for incorporating external actuation components. By efficiently and rapidly converting internal energy of airflow into the kinetic energy of robots, HIPO achieves a frequency of up to 20 Hz. Furthermore, we demonstrate the versatility and high-performance capabilities of HIPO through bio-inspired robots: an insect-like fast-crawler (with speeds up to 50.27 cm/s), a high-frequency butterfly-like wing-flapper, and a maneuverable duck-like swimmer. By eliminating external components and seamlessly fusing signal generation, energy conversion, and motion output, HIPO unleashes rapid and efficient motion, unlocking potential for high-performance soft robotics.
Abstract:Continuous reinforcement learning such as DDPG and A3C are widely used in robot control and autonomous driving. However, both methods have theoretical weaknesses. While DDPG cannot control noises in the control process, A3C does not satisfy the continuity conditions under the Gaussian policy. To address these concerns, we propose a new continues reinforcement learning method based on stochastic differential equations and we call it Incremental Reinforcement Learning (IRL). This method not only guarantees the continuity of actions within any time interval, but controls the variance of actions in the training process. In addition, our method does not assume Markov control in agents' action control and allows agents to predict scene changes for action selection. With our method, agents no longer passively adapt to the environment. Instead, they positively interact with the environment for maximum rewards.