Extremely large-scale massive multiple-input multiple-output (XL-MIMO) systems introduce the much higher channel dimensionality and incur the additional near-field propagation effect, aggravating the computation load and the difficulty to acquire the prior knowledge for channel estimation. In this article, an XL-MIMO channel network (XLCNet) is developed to estimate the high-dimensional channel, which is a universal solution for both the near-field users and far-field users with different channel statistics. Furthermore, a compressed XLCNet (C-XLCNet) is designed via weight pruning and quantization to accelerate the model inference as well as to facilitate the model storage and transmission. Simulation results show the performance superiority and universality of XLCNet. Compared to XLCNet, C-XLCNet incurs the limited performance loss while reducing the computational complexity and model size by about $10 \times$ and $36 \times$, respectively.