The dissemination of fake news on social networks has drawn public need for effective and efficient fake news detection methods. Generally, fake news on social networks is multi-modal and has various connections with other entities such as users and posts. The heterogeneity in both news content and the relationship with other entities in social networks brings challenges to designing a model that comprehensively captures the local multi-modal semantics of entities in social networks and the global structural representation of the propagation patterns, so as to classify fake news effectively and accurately. In this paper, we propose a novel Transformer-based model: HetTransformer to solve the fake news detection problem on social networks, which utilises the encoder-decoder structure of Transformer to capture the structural information of news propagation patterns. We first capture the local heterogeneous semantics of news, post, and user entities in social networks. Then, we apply Transformer to capture the global structural representation of the propagation patterns in social networks for fake news detection. Experiments on three real-world datasets demonstrate that our model is able to outperform the state-of-the-art baselines in fake news detection.