The diagnostic performance of most of the deep learning models is greatly affected by the selection of model architecture and their hyperparameters. The main challenges in model selection methodologies are the design of architecture optimizer and model evaluation strategy. In this paper, we have proposed a novel framework of evolutionary deep neural network which uses policy gradient to guide the evolution of DNN architecture towards maximum diagnostic accuracy. We have formulated a policy gradient-based controller which generates an action to sample the new model architecture at every generation. The best fitness obtained is used as a reward to update the policy parameters. Also, the best model obtained is transferred to the next generation for quick model evaluation in the NSGA-II evolutionary framework. Thus, the algorithm gets the benefits of fast non-dominated sorting as well as quick model evaluation. The effectiveness of the proposed framework has been validated on three datasets: the Air Compressor dataset, Case Western Reserve University dataset, and Paderborn university dataset.