Deep learning (DL) has emerged as a promising tool to downscale climate projections at regional-to-local scales from large-scale atmospheric fields following the perfect-prognosis (PP) approach. Given their complexity, it is crucial to properly evaluate these methods, especially when applied to changing climatic conditions where the ability to extrapolate/generalise is key. In this work, we intercompare several DL models extracted from the literature for the same challenging use-case (downscaling temperature in the CORDEX North America domain) and expand standard evaluation methods building on eXplainable artifical intelligence (XAI) techniques. We show how these techniques can be used to unravel the internal behaviour of these models, providing new evaluation dimensions and aiding in their diagnostic and design. These results show the usefulness of incorporating XAI techniques into statistical downscaling evaluation frameworks, especially when working with large regions and/or under climate change conditions.