Fault classification in industrial machinery is vital for enhancing reliability and reducing downtime, yet it remains challenging due to the variability of vibration patterns across diverse operating conditions. This study introduces a novel graph-based framework for fault classification, converting time-series vibration data from machinery operating at varying horsepower levels into a graph representation. We utilize Shannon's entropy to determine the optimal window size for data segmentation, ensuring each segment captures significant temporal patterns, and employ Dynamic Time Warping (DTW) to define graph edges based on segment similarity. A Graph Auto Encoder (GAE) with a deep graph transformer encoder, decoder, and ensemble classifier is developed to learn latent graph representations and classify faults across various categories. The GAE's performance is evaluated on the Case Western Reserve University (CWRU) dataset, with cross-dataset generalization assessed on the HUST dataset. Results show that GAE achieves a mean F1-score of 0.99 on the CWRU dataset, significantly outperforming baseline models-CNN, LSTM, RNN, GRU, and Bi-LSTM (F1-scores: 0.94-0.97, p < 0.05, Wilcoxon signed-rank test for Bi-LSTM: p < 0.05) -- particularly in challenging classes (e.g., Class 8: 0.99 vs. 0.71 for Bi-LSTM). Visualization of dataset characteristics reveals that datasets with amplified vibration patterns and diverse fault dynamics enhance generalization. This framework provides a robust solution for fault diagnosis under varying conditions, offering insights into dataset impacts on model performance.