Predicting the quality of multimedia content is often needed in different fields. In some applications, quality metrics are crucial with a high impact, and can affect decision making such as diagnosis from medical multimedia. In this paper, we focus on such applications by proposing an efficient and shallow model for predicting the quality of medical images without reference from a small amount of annotated data. Our model is based on convolution self-attention that aims to model complex representation from relevant local characteristics of images, which itself slide over the image to interpolate the global quality score. We also apply domain adaptation learning in unsupervised and semi-supervised manner. The proposed model is evaluated through a dataset composed of several images and their corresponding subjective scores. The obtained results showed the efficiency of the proposed method, but also, the relevance of the applying domain adaptation to generalize over different multimedia domains regarding the downstream task of perceptual quality prediction. \footnote{Funded by the TIC-ART project, Regional fund (Region Centre-Val de Loire)}