Recently, deep learning has attracted more and more attention in phase unwrapping of fringe projection three-dimensional (3D) measurement, with the aim to improve the performance leveraging the powerful Convolutional Neural Network (CNN) models. In this paper, for the first time (to the best of our knowledge), we introduce the Transformer into the phase unwrapping which is different from CNN and propose Hformer model dedicated to phase unwrapping via fringe order prediction. The proposed model has a hybrid CNN-Transformer architecture that is mainly composed of backbone, encoder and decoder to take advantage of both CNN and Transformer. Encoder and decoder with cross attention are designed for the fringe order prediction. Experimental results show that the proposed Hformer model achieves better performance in fringe order prediction compared with the CNN models such as U-Net and DCNN. Moreover, ablation study on Hformer is made to verify the improved feature pyramid networks (FPN) and testing strategy with flipping in the predicted fringe order. Our work opens an alternative way to deep learning based phase unwrapping methods, which are dominated by CNN in fringe projection 3D measurement.