This paper provides a comprehensive review of mechanical equipment fault diagnosis methods, focusing on the advancements brought by Transformer-based models. It details the structure, working principles, and benefits of Transformers, particularly their self-attention mechanism and parallel computation capabilities, which have propelled their widespread application in natural language processing and computer vision. The discussion highlights key Transformer model variants, such as Vision Transformers (ViT) and their extensions, which leverage self-attention to improve accuracy and efficiency in visual tasks. Furthermore, the paper examines the application of Transformer-based approaches in intelligent fault diagnosis for mechanical systems, showcasing their superior ability to extract and recognize patterns from complex sensor data for precise fault identification. Despite these advancements, challenges remain, including the reliance on extensive labeled datasets, significant computational demands, and difficulties in deploying models on resource-limited devices. To address these limitations, the paper proposes future research directions, such as developing lightweight Transformer architectures, integrating multimodal data sources, and enhancing adaptability to diverse operational conditions. These efforts aim to further expand the application of Transformer-based methods in mechanical fault diagnosis, making them more robust, efficient, and suitable for real-world industrial environments.