In this article, we present an intelligent framework for 5G new radio (NR) indoor positioning under a monostatic configuration. The primary objective is to estimate both the angle of arrival and time of arrival simultaneously. This requires capturing the pertinent information from both the antenna and subcarrier dimensions of the receive signals. To tackle the challenges posed by the intricacy of the high-dimensional information matrix, coupled with the impact of irregular array errors, we design a deep learning scheme. Recognizing that the phase difference between any two subcarriers and antennas encodes spatial information of the target, we contend that the transformer network is better suited for this problem compared to the convolutional neural network which excels in local feature extraction. To further enhance the network's fitting capability, we integrate the transformer with a model-based multiple-signal-classification (MUSIC) region decision mechanism. Numerical results and field tests demonstrate the effectiveness of the proposed framework in accurately calibrating the irregular angle-dependent array error and improving positioning accuracy.