Noisy data and the similarity in the ocular appearances caused by different ophthalmic pathologies pose significant challenges for an automated expert system to accurately detect retinal diseases. In addition, the lack of knowledge transferability and the need for unreasonably large datasets limit clinical application of current machine learning systems. To increase robustness, a better understanding of how the retinal subspace deformations lead to various levels of disease severity needs to be utilized for prioritizing disease-specific model details. In this paper we propose the use of disease-specific feature representation as a novel architecture comprised of two joint networks -- one for supervised encoding of disease model and the other for producing attention maps in an unsupervised manner to retain disease specific spatial information. Our experimental results on publicly available datasets show the proposed joint-network significantly improves the accuracy and robustness of state-of-the-art retinal disease classification networks on unseen datasets.