The optimal control of open quantum systems is a challenging task but has a key role in improving existing quantum information processing technologies. We introduce a general framework based on Reinforcement Learning to discover optimal thermodynamic cycles that maximize the power of out-of-equilibrium quantum heat engines and refrigerators. We apply our method, based on the soft actor-critic algorithm, to three systems: a benchmark two-level system heat engine, where we find the optimal known cycle; an experimentally realistic refrigerator based on a superconducting qubit that generates coherence, where we find a non-intuitive control sequence that outperform previous cycles proposed in literature; a heat engine based on a quantum harmonic oscillator, where we find a cycle with an elaborate structure that outperforms the optimized Otto cycle. We then evaluate the corresponding efficiency at maximum power.