A novel approach of applying deep reinforcement learning to an RF pulse design is introduced. This method, which is referred to as $DeepRF_{SLR}$, is designed to minimize the peak amplitude or, equivalently, minimize the pulse duration of a multiband refocusing pulse generated by the Shinar Le-Roux (SLR) algorithm. In the method, the root pattern of SLR polynomial, which determines the RF pulse shape, is optimized by iterative applications of deep reinforcement learning and greedy tree search. When tested for the designs of the multiband factors of three and seven RFs, $DeepRF_{SLR}$ demonstrated improved performance compared to conventional methods, generating shorter duration RF pulses in shorter computational time. In the experiments, the RF pulse from $DeepRF_{SLR}$ produced a slice profile similar to the minimum-phase SLR RF pulse and the profiles matched to that of the computer simulation. Our approach suggests a new way of designing an RF by applying a machine learning algorithm, demonstrating a machine-designed MRI sequence.