In this letter, we focus on the problem of millimeter-Wave channels estimation in massive MIMO communication systems. Inspired by the sparsity of mmWave MIMO channel in the angular domain, we formulate the estimation problem as a sparse signal recovery problem. We propose a deep learning based trainable proximal gradient descent network (TPGD-Net) for mmWave channel estimation. Specifically, we unfold the iterative proximal gradient descent (PGD) algorithm into a layer-wise network. Different from the PGD algorithm, the gradient descent step size in TPGD-Net is set as a trainable parameter. Moreover, the proximal operator in PGD algorithm is replaced by a tailored neural network which incorporates data-driven prior channel information to perform proximal operator in an implicit manner. We further improve the performance of the TPGD-Net by introducing the inter-stage feature pathways module to alleviate the feature information transmission bottleneck between each two adjacent layers. Simulation results on the Saleh-Valenzuela channel model and the DeepMIMO dataset demonstrate its effectiveness compared to the state-of-the-art mmWave channel estimators.