This paper studies the quality-of-service (QoS) constrained multi-group multicast beamforming design problem, where each multicast group is composed of a number of users requiring the same content. Due to the nonconvex QoS constraints, this problem is nonconvex and NP-hard. While existing optimization-based iterative algorithms can obtain a suboptimal solution, their iterative nature results in large computational complexity and delay. To facilitate real-time implementations, this paper proposes a deep learning-based approach, which consists of a beamforming structure assisted problem transformation and a customized neural network architecture named hierarchical permutation equivariance (HPE) transformer. The proposed HPE transformer is proved to be permutation equivariant with respect to the users within each multicast group, and also permutation equivariant with respect to different multicast groups. Simulation results demonstrate that the proposed HPE transformer outperforms state-of-the-art optimization-based and deep learning-based approaches for multi-group multicast beamforming design in terms of the total transmit power, the constraint violation, and the computational time. In addition, the proposed HPE transformer achieves pretty good generalization performance on different numbers of users, different numbers of multicast groups, and different signal-to-interference-plus-noise ratio targets.