Most codec designs rely on the mean squared error (MSE) as a fidelity metric in rate-distortion optimization, which allows to choose the optimal parameters in the transform domain but may fail to reflect perceptual quality. Alternative distortion metrics, such as the structural similarity index (SSIM), can be computed only pixel-wise, so they cannot be used directly for transform-domain bit allocation. Recently, the irregularity-aware graph Fourier transform (IAGFT) emerged as a means to include pixel-wise perceptual information in the transform design. This paper extends this idea by also learning a graph (and corresponding transform) for sets of blocks that share similar perceptual characteristics and are observed to differ statistically, leading to different learned graphs. We demonstrate the effectiveness of our method with both SSIM- and saliency-based criteria. We also propose a framework to derive separable transforms, including separable IAGFTs. An empirical evaluation based on the 5th CLIC dataset shows that our approach achieves improvements in terms of MS-SSIM with respect to existing methods.