For decades, people have been seeking for fishlike flapping motions that can realize underwater propulsion with low energy cost. Complexity of the nonstationary flow field around the flapping body makes this problem very difficult. In earlier studies, motion patterns are usually prescribed as certain periodic functions which constrains the following optimization process in a small subdomain of the whole motion space. In this work, to avoid this motion constraint, a variational autoencoder (VAE) is designed to compress various flapping motions into a simple action vector. Then we let a flapping airfoil continuously interact with water tunnel environment and adjust its action accordingly through a reinforcement learning (RL) framework. By this automatic close-looped experiment, we obtain several motion patterns that can result in high hydrodynamic efficiency comparing to pure harmonic motions with the same thrust level. And we find that, after numerous trials and errors, RL trainings in current experiment always converge to motion patterns that are close to harmonic motions. In other words, current work proves that harmonic motion with appropriate amplitude and frequency is always an optimal choice for efficient underwater propulsion. Furthermore, the RL framework proposed here can be also extended to the study of other complex swimming problems, which might pave the way for the creation of a robotic fish that can swim like a real fish.