The black-box nature of deep learning models has raised concerns about their interpretability for successful deployment in real-world clinical applications. To address the concerns, eXplainable Artificial Intelligence (XAI) aims to provide clear and understandable explanations of the decision-making process. In the medical domain, concepts such as attributes of lesions or abnormalities serve as key evidence for deriving diagnostic results. However, existing concept-based models mainly depend on concepts that appear independently and require fine-grained concept annotations such as bounding boxes. A medical image usually contains multiple concepts and the fine-grained concept annotations are difficult to acquire. In this paper, we propose a novel Concept-Attention Whitening (CAW) framework for interpretable skin lesion diagnosis. CAW is comprised of a disease diagnosis branch and a concept alignment branch. In the former branch, we train the CNN with a CAW layer inserted to perform skin lesion diagnosis. The CAW layer decorrelates features and aligns image features to conceptual meanings via an orthogonal matrix. In the latter branch, we calculate the orthogonal matrix under the guidance of the concept attention mask. We particularly introduce a weakly-supervised concept mask generator that only leverages coarse concept labels for filtering local regions that are relevant to certain concepts, improving the optimization of the orthogonal matrix. Extensive experiments on two public skin lesion diagnosis datasets demonstrated that CAW not only enhanced interpretability but also maintained a state-of-the-art diagnostic performance.