Channel estimation is one of the key issues in practical massive multiple-input multiple-output (MIMO) systems. Compared with conventional estimation algorithms, deep learning (DL) based ones have exhibited great potential in terms of performance and complexity. In this paper, an attention mechanism, exploiting the channel distribution characteristics, is proposed to improve the estimation accuracy of highly separable channels with narrow angular spread by realizing the "divide-and-conquer" policy. Specifically, we introduce a novel attention-aided DL channel estimation framework for conventional massive MIMO systems and devise an embedding method to effectively integrate the attention mechanism into the fully connected neural network for the hybrid analog-digital (HAD) architecture. Simulation results show that in both scenarios, the channel estimation performance is significantly improved with the aid of attention at the cost of small complexity overhead. Furthermore, strong robustness under different system and channel parameters can be achieved by the proposed approach, which further strengthens its practical value. We also investigate the distributions of learned attention maps to reveal the role of attention, which endows the proposed approach with a certain degree of interpretability.