Power control for the device-to-device interference channel with single-antenna transceivers has been widely analyzed with both model-based methods and learning-based approaches. Although the learning-based approaches, i.e., datadriven and model-driven, offer performance improvement, the widely adopted graph neural network suffers from learning the heterophilous power distribution of the interference channel. In this paper, we propose a deep learning architecture in the family of graph transformers to circumvent the issue. Experiment results show that the proposed methods achieve the state-of-theart performance across a wide range of untrained network configurations. Furthermore, we show there is a trade-off between model complexity and generality.