Place recognition or loop closure detection is one of the core components in a full SLAM system. In this paper, aiming at strengthening the relevancy of local neighboring points and the contextual dependency among global points simultaneously, we investigate the exploitation of transformer-based network for feature extraction, and propose a Hierarchical Transformer for Place Recognition (HiTPR). The HiTPR consists of four major parts: point cell generation, short-range transformer (SRT), long-range transformer (LRT) and global descriptor aggregation. Specifically, the point cloud is initially divided into a sequence of small cells by downsampling and nearest neighbors searching. In the SRT, we extract the local feature for each point cell. While in the LRT, we build the global dependency among all of the point cells in the whole point cloud. Experiments on several standard benchmarks demonstrate the superiority of the HiTPR in terms of average recall rate, achieving 93.71% at top 1% and 86.63% at top 1 on the Oxford RobotCar dataset for example.